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Application  of  Mathematical  Signal  Processing 
Techniques  to  Mission  Systems 

(RTO  EN-7) 


Executive  Summary 

Signal  processing  techniques  must  develop  substantially,  on  the  one  hand  in  order  to  respond  in  a  more 
relevant  way  to  more  demanding  operational  requirements,  and  on  the  other  to  obtain  maximum  benefit 
from  improvements  in  the  technologies  on  which  they  are  based,  whether  it  be  for  the  sensors  which  supply 
them,  or  the  data  processing  techniques  which  enable  their  implementation. 

With  regard  to  sensors  in  particular,  the  trend  is  to  use  the  signal  for  imaging,  at  increasingly  fine 
resolution,  with  generally  much  larger  fields.  Moreover,  processing  commonly  concerns  sequences  of 
images,  with  close  integration  of  spatial  and  temporal  dimensions.  Present  day  systems  in  fact  tend  to 
multiply  the  number  of  sensors  and  frequency  bands  operated  in  close  synergy,  leading  to  multi-resolution 
and  non-uniform  data  (reference  systems,  reliability,...).  The  data  available  are  thus  increasing  in  volume, 
in  density  and  in  irregularity,  and  as  a  result  are  becoming  more  difficult  to  use. 

Operational  situations  require  the  generation  of  increasingly  accurate,  undeformable  and  summarised 
information,  to  be  generated  under  more  and  more  difficult  conditions  with  shorter  and  shorter  reaction 
times.  The  data  and  the  interconnections  which  result  from  it,  must  therefore  be  treated  with  care,  while  at 
the  same  time  attempting  to  ensure  the  highest  possible  level  of  automaticity. 

There  are  a  number  of  emerging  techniques  which  could  meet  these  requirements,  mostly  originating  in 
mathematical  theories  as  diverse  as  wavelets,  variational  methods  or  the  theory  of  evidence.  These 
techniques  cover  the  whole  processing  chain  fairly  evenly,  and  in  particular  signal  compression  and 
transmission,  data  extraction  and  interpretation,  and  decision-making  aids. 

JUSTIFICATION:  The  complementarity  of  the  different  emerging  techniques,  presented  in  the  most 
varied  mathematical  frameworks,  so  as  to  respond  to  what  is  a  critical  development  in  sensor  system 
integration  requirements,  should  produce  a  series  of  tools  capable  of  meeting  the  needs  expressed  at  all 
levels  of  the  processing  chain. 

SUBJECTS  EXAMINED:  This  Lecture  Series  presents  a  whole  range  of  perspectives  for  different  levels 
of  processing,  based  on  some  of  the  most  promising  techniques.  Particular  attention  will  be  paid  to  the 
following  subjects: 

-  Wavelet  analysis:  summary  of  the  possibilities;  application  to  detection  in  natural  background  radiation 
and  extraction  of  primitive  invariants. 

-  The  concept  of  Multirate  Filter  Banks  in  conjunction  with  the  various  transforms  which  this  technique 
enables;  applications  to  compressed  video  image  and  sequence  transmission,  to  noise  rejection,  to 
jamming  and  to  encoding. 

-  Variational  methods  based  on  partial  derivative  equations  for  image  processing  and  multi-scale  video 
sequences;  presentation  of  different  image  segmentation  approaches. 

-  Multi-sensor  processing  based  on  the  theory  of  evidence:  processing  of  the  functions  of  detection, 
classification,  matching  of  ambiguous  observations,  or  tracking,  with  the  aim  of  solving  problems  such 
as  data  modelling,  decision  making,  the  management  of  non-uniform  reference  systems,  or  the 
integration  of  contextual  knowledge. 

The  material  in  this  publication  was  assembled  to  support  a  Lecture  Series  under  the  sponsorship  of  the 
Systems  Concepts  and  Integration  Panel  (SCI)  and  the  Consultant  and  Exchange  Programme  of  RTA 
presented  on  1-2  November  1999  at  DLR  Koln,  Germany,  on  4-5  November  1999  at  ONERA,  Paris, 
France,  and  9-10  November  1999  at  the  Naval  Post  Graduate  School,  Monterey,  United  States. 


L’application  des  techniques  mathematiques 
du  traitement  du  signal  aux  systemes 
de  conduite  des  missions 

(RTO  EN-7) 


Synthese 


Les  techniques  de  traitement  du  signal  doivent  evoluer  de  fa9on  substantielle,  d’une  part  pour  repondre  d’une 
fa§on  pertinente  a  des  besoins  operationnels  de  plus  en  plus  exigeants,  et  d’ autre  part  pour  tirer  tout  le  benefice 
de  T  amelioration  des  technologies  sur  lesquelles  elles  reposent,  qu’il  s’agisse  des  senseurs  qui  les  alimentent  ou 
des  moyens  informatiques  qui  permettent  leur  mise  en  oeuvre. 

Au  niveau  des  senseurs  en  particulier,  le  signal  evolue  de  plus  en  plus  vers  l’imagerie  dont  la  resolution  est  de 
plus  en  plus  fine  pour  des  champs  generalement  plus  importants.  II  faut  traiter  le  plus  souvent,  des  sequences 
d’images  et  ceci  en  integrant  etroitement  leurs  dimensions  temporelle  et  spatiale.  Les  systemes  actuels 
multiplient  de  plus  le  nombre  de  senseurs  et  de  bandes  de  frequence  qu’il  convient  d’exploiter  en  etroite 
synergie,  conduisant  notamment  a  des  problemes  de  multi-resolutions  et  d’het6rogeneite  des  donnees 
(referentiels,  fiabilite,...).  Les  donnees  disponibles  croissent  done  en  volume,  en  richesse,  en  heterogeneite,  et  en 
difficulte  d’ exploitation. 

Les  besoins  operationnels  requierent  par  ailleurs  l’elaboration  d’ informations  de  plus  en  plus  precises,  robustes, 
synthetiques,  ceci  dans  des  conditions  adverses  souvent  plus  difficiles  et  avec  des  delais  de  reaction  de  plus  en 
plus  courts.  II  convient  done  d’exploiter  de  fa^on  d’autant  plus  rigoureuse  les  donnees  et  leurs  synergies,  tout  en 
cherchant  un  niveau  d’automatisation  le  plus  eleve  possible. 

Pour  faire  face  a  ces  besoins,  un  certain  nombre  de  techniques  emergentes  et  porteuses  ont  pu  etre  degagees  a 
partir  de  theories  mathematiques  aussi  variees  que  les  ondelettes,  les  methodes  variationnelles  ou  la  theorie  de 
T  evidence.  Ces  techniques  couvrent  de  facon  assez  homogene  l’ensemble  de  la  chaine  de  traitement,  notamment 
la  compression  et  la  transmission  des  signaux,  l’extraction  d’information,  ^interpretation,  et  l’aide  a  la  decision. 

JUSTIFICATION  :  Les  complementarites  de  differentes  techniques  emergentes  et  porteuses,  elaborees  dans 
des  cadres  mathematiques  les  plus  varies  pour  repondre  a  une  evolution  critique  des  besoins  en  matiere 
d’integration  de  systemes  de  senseurs,  permettent  d’envisager  un  ensemble  d’outils  propres  a  satisfaire  tous  les 
maillons  de  la  chaine  de  traitement. 

SUJETS  A  TRAITER  :  Le  cycle  de  conferences  propose  vise  a  presenter  un  eventail  des  perspectives  offertes 
aux  differents  niveaux  du  processus  de  traitement,  en  s’appuyant  sur  quelques  techniques  parmi  les  plus 
prometteuses.  Les  sujets  suivants  seront  notamment  abordes  : 

-  Analyse  par  ondelettes  :  synthese  des  possibilites  offertes  ;  application  a  la  detection  dans  des  fonds  naturels 
structures  et  a  l’extraction  de  primitives  invariantes  ; 

-  Concept  de  “Multirate  Filter  Banks”  en  liaison  avec  les  differentes  transformees  qu’il  permet  de  mettre  en 
oeuvre  ;  applications  dans  le  domaine  des  transmissions  a  la  compression  d’images  et  de  sequences  video,  a  la 
rejection  de  bruit,  au  brouillage,  et  au  codage  ; 

-  Methodes  variationnelles  basees  sur  les  equations  aux  derivees  partielles  pour  le  traitement  d’images  et  de 
sequences  video  multi-echelles  ;  presentation  de  differentes  approches  en  segmentation  d’images  ; 

-  Traitements  multi-senseurs  bases  sur  la  theorie  de  1’ evidence :  traitement  des  fonctions  de  detection, 
classification,  mise  en  correspondance  d’ observations  ambigues,  ou  pistage,  visant  a  resoudre  des  problemes 
tels  que  la  modelisation  des  donnees,  la  prise  de  decision,  la  gestion  de  referentiels  heterogenes,  ou 
l’integration  de  connaissances  contextuelles. 

Cette  publication  a  ete  redigee  pour  servir  de  support  de  cours  pour  le  Cycle  de  conferences  216,  organise  par  la 
Commission  RTO  sur  les  (SCI)  du  1-2  novembre  1999,  DLR,  (Allemagne)  et  du  4  au  5  novembre  1999  a 
l’ONERA,  (France),  et  du  9  au  10  novembre  1999  a  Naval  Post  Graduate  School,  Monterey  (Etats-Unis). 
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Introduction  to  Wavelet  Analysis 

G.H.Watson 

Room  1052,  A2  Building,  DERA  Farnborough,  Ively  Road,  Farnborough,  Hants,  GU14  OLX,  UK 


1.  Introduction 

This  paper  introduces  the  concepts  of  wavelet  analysis 
and  gives  an  overview  of  the  numerous  wavelet  analysis 
techniques  in  existence.  The  principal  aim  of  this  paper 
is  to  promote  an  awareness  of  wavelet  analysis,  not  to 
provide  technical  details,  as  the  latter  are  available  in 
many  textbooks,  for  example  [1,2].  Most  of  the 
underlying  principles  are  applicable  to  1 -dimensional 
signal  analysis,  and  there  are  straightforward  methods  to 
adapt  ID  wavelet  analysis  to  higher-dimensional  data, 
also  covered  in  this  paper.  Hence,  much  of  this  paper  is 
concerned  with  1 -dimensional  signal  analysis,  even 
though  higher-dimensional  data  is  of  equal  importance. 
Major  topics  covered  in  this  paper  are  the  continuous 
wavelet  transform  and  its  inverse,  the  discrete  wavelet 
transform  and  its  relation  to  multiresolution  filter  banks, 
orthonormal  and  biorthogonal  wavelets,  image  wavelet 
analysis  and  wavelet  packets. 

We  begin  in  this  section  with  an  overview  of  what 
wavelet  analysis  is,  why  it  is  useful,  and  present  some 
common  applications.  Throughout  this  paper,  key  words 
and  phrases  are  highlighted  in  bold  text. 

Wavelet  analysis  is  the  extraction  of  signal  or  image 
information  at  different  positions  and  scales.  The  idea  is 
to  treat  all  positions  and  scales  on  an  equal  footing,  so 
that  an  object  will  be  analysed  in  the  same  way, 
regardless  of  whether  it  is  translated  or  dilated.  This 
approach  is  useful  because  translation  and  dilation  are 
natural  symmetries  that  occur  very  often  in  nature,  and 
in  signal  and  image  processing.  If  we  are  looking  for  an 
object,  we  generally  don’t  know  where  it  will  be,  and  in 
many  surveillance  applications  it  is  equally  likely  to  be 
anywhere  in  the  signal  or  image.  The  statistics  of  the 
signal  or  image  are  thus  translation-invariant,  otherwise 
known  as  being  stationary.  Similarly,  if  we’re  analysing 
signals  over  time,  we  don’t  know  when  an  event  will 
occur,  for  example  a  transient  sound  in  an  acoustic 
signal. 

Scale  invariance  is  also  important  in  signal  and  image 
processing,  but  the  reasons  are  sometimes  less  obvious. 
Sometimes  scale-invariant  processing  is  required 
because  the  objects  being  analysed  could  be  at  any 
range,  and  therefore  of  unknown  apparent  size,  or  the 
camera  may  have  a  zoom  facility  which  also  dilates  the 
image.  Similarly,  sounds  such  as  musical  notes  may 
have  variable  duration,  but  in  other  respects  are  similar. 
What  is  more  subtle  and  interesting  is  the  invariance  of 


many  natural  processes  and  scenes  to  dilation.  Scenes 
such  as  sky,  clouds,  mountains  and  forests  are  of  interest 
as  backgrounds  in  surveillance  and  detection.  It  should 
be  obvious  that  such  backgrounds  are  statistically 
independent  of  translation,  as  there  is  no  concept  of 
“absolute”  position.  This  is  similar  to  the  underlying 
principle  of  relativity,  although  the  latter  concerns  the 
laws  of  physics  and  also  invariance  to  constant  velocity 
changes. 

What  is  less  obvious  is  that  many  natural  scenes  are 
scale-invariant;  when  we  observe  such  scenes  as  images, 
the  range  or  magnification  are  difficult  to  discern,  unless 
there  are  reference  objects  of  known  size.  Even  many 
artefacts,  such  as  roads  and  buildings,  are  difficult  to 
scale.  Self-similar  objects  are  known  as  fractals,  and  the 
study  of  fractal  geometry  has  been  an  important  topic  of 
research  in  recent  decades  [3],  in  which  scale-invariance 
is  known  as  self-similarity.  There  are  many  physical 
processes  which  are  self-similar,  for  example  turbulence 
in  fluids,  and  wavelet  analysis  has  been  an  important 
tool  in  the  analysis  of  such  processes. 

There  are  natural  symmetries  other  than  translation  and 
dilation,  which  will  be  mentioned  in  Sections  7  and  8. 
Downward-looking  imagery  is  often  statistically 
rotation-invariant,  there  being  no  bias  in  orientation. 
Frequency  shifts  are  a  natural  symmetry  for  some  types 
of  noise,  for  example  Gaussian  white  noise. 

Another  important  requirement  of  wavelet  analysis  is 
resolution  in  position  and  scale,  so  that  objects  at 
different  positions  and  scales  can  be  analysed 
independently,  with  minimal  interference.  To  achieve 
this,  an  appropriate  basis  of  functions  is  required  for  the 
analysis.  The  most  primitive  basis  comprises  the  delta 
functions  which  return  the  sample  or  grey-scale  values  at 
each  point  or  pixel  in  the  signal  or  image.  Delta 
functions  are  best  at  resolving  position  but  cannot 
resolve  scale  or  frequency.  Conversely,  a  Fourier  basis, 
comprising  sinusoids  or  complex  exponentials,  is  best  at 
resolving  frequency,  but  cannot  resolve  position.  Neither 
of  these  bases  is  scale-invariant,  which  is  where  wavelet 
bases  come  in,  discussed  in  Section  2. 

We  conclude  this  section  with  some  applications  of 
wavelet  analysis,  to  demonstrate  the  practical  importance 
of  translation-  and  scale-invariant  processing. 


Paper  presented  at  the  RTO  SCI  Lecture  Series  on  " Application  of  Mathematical  Signal  Processing  Techniques 
to  Mission  Systems”,  held  in  Koln,  Germany,  1-2  November  1999;  Paris,  France,  4-5  November  1999; 
Monterey,  USA,  9-10  November  1999,  and  published  in  RTO  EN-7. 
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1.1  Data  Compression  Fig-  A.  Example  of  wavelet  image  compression  on  ‘Lena’ 


Data  compression  is  perhaps  the  most  widely  used 
application  of  wavelet  analysis.  Most  real-life  images 
have  strong  phase  correlation,  like  edges,  and  are 
intermittent,  with  some  parts  being  smooth  and  other 
parts  rough,  or  with  sharp  edges.  With  the  delta  function 
basis  there  is  considerable  redundancy  in  the  smooth 
parts,  as  the  function  values  (sample  values)  are  similar, 
so  smooth  regions  require  a  basis  of  smooth  functions  to 
be  encoded  efficiently.  There  is  a  lot  of  low-frequency 
energy  in  smooth  signals,  which  suggests  that  a  Fourier 
Transform  might  be  more  efficient,  but  sharp  edges  are  a 
problem,  because  they  have  energy  over  a  wide  range  of 
frequency.  Thus  we  would  need  to  partition  the  image 
into  regions  each  with  separate  frequency 
decomposition,  which  leads  to  windowed  Fourier  (or 
cosine)  transforms,  for  example  the  discrete  cosine 
transform  (DCT)  used  in  JPEG  image  compression  [4]. 
A  similar  technique  is  used  in  encoding  audio  signals  in 
the  form  of  the  Gabor  transform  or  spectrogram  [5]. 
Thus  edges  can  decomposed  separately,  leaving  smooth 
regions  to  be  encoded  more  efficiently. 

The  windowed  Fourier  technique  is  quite  effective,  but 
this  type  of  coding  is  still  limited  because  a  fixed 
window  size  is  used.  If  a  large  window  is  used,  edges 
and  high  frequency  energy  are  coded  badly,  because 
there  is  significant  leakage  into  smooth  regions,  as 
windows  of  fixed  size  and  regular  spacing  do  not  usually 
fit  edges  well.  If  small  windows  are  used,  low  frequency 
smooth  regions  are  coded  badly,  as  there  are  too  many 
windows  replicating  information.  What  we  need  is  a 
variable-scale  window,  which  is  where  the  wavelet 
transform  comes  in.  The  above  coding  problems  are 
caused  by  a  lack  of  scale  invariance,  as  a  fixed  window 
does  not  treat  different  scales  alike. 

If  we  use  the  wavelet  transform,  the  signal  or  image  is 
decomposed  into  a  pyramid,  each  layer  having 
information  at  a  different  scale  and  level  of  detail.  Each 
layer  comprises  a  regular  grid,  where  at  each  point  there 
is  a  wavelet  coefficient  encoding  the  information  within 
the  image  at  that  particular  position  and  scale.  The  grid 
spacing  is  proportional  to  scale,  so  a  small  number  of 
coefficients  is  required  at  large  scale  and  low  resolution. 
Thus  smooth  parts  of  the  image  are  encoded  efficiently. 
At  small  scales  a  large  number  of  coefficients  is 
required,  but  in  smooth  areas  these  will  be  low  in 
magnitude,  and  can  be  ignored  with  minimal  loss  of 
information.  Thus  we  are  getting  what  we  want:  smooth 
regions  are  encoded  with  a  small  number  of  coefficients, 
and  other  regions,  such  as  edges,  are  encoded  with  a 
larger  number  of  coefficients.  Fig.  A  gives  an  example 
of  image  compression  using  symmetrical  Daubechies 
wavelets. 


(a)  Original  image 


(b)  Image  at  27:1  compression 

1.2  De-Noising 

If  a  signal  or  image  is  corrupted  with  noise,  we  wish  to 
recover  as  much  of  the  original  information  as  possible 
We  cannot  do  a  perfect  job,  because  some  parts  of  the 
signal  will  be  indistinguishable  from  noise;  they  could 
have  arisen  with  some  probability  from  the  random 
process  generating  the  noise.  The  usual  method  is  to 
decompose  the  signal  into  a  set  of  functions  using  a 
prescribed  basis  (in  our  case  using  a  wavelet  basis), 
distinguish  the  components  that  come  from  noise  from 
those  that  don’t  (to  some  level  of  confidence),  remove 
the  former,  and  reconstruct  the  signal  or  image  from  the 
latter.  The  role  of  the  basis  is  to  do  the  best  possible  job 
of  separating  the  original  signal  and  noise.  The  best 
choice  of  basis  depends  both  on  what  we  expect  to  find 
in  the  uncorrupted  signal,  and  on  the  statistical  properties 
of  the  noise. 
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When  the  expected  signal  is  self-similar  both  in  position 
and  scale,  the  wavelet  transform  is  the  obvious  method 
of  decomposition.  If  we  have  Gaussian  white  noise  then 
it  turns  out  that  the  resulting  wavelet  coefficients  all 
have  the  same  Gaussian  distribution,  so  the  natural  way 
of  de-noising  is  to  set  a  threshold  on  the  amplitudes  of 
the  wavelet  coefficients,  and  to  reject  (set  to  zero)  all 
those  below  this  threshold.  If,  say,  the  probability  of  the 
wavelet  coefficients  from  the  noise  exceeding  this 
threshold  is  only  1/1000,  then  anything  remaining  is 
more  than  99.9%  likely  to  come  from  the  signal.  There  is 
a  trade-off  between  missing  too  much  of  the  signal  and 
leaving  too  much  of  the  noise,  and  the  required  balance 
affects  the  value  of  the  threshold.  Fig.  B  gives  an 
example  of  signal  de-noising  using  wavelet  analysis. 


Original  signal 


Fig.  B.  Example  of  de-noising  of  a  1 -dimensional  signal 
using  wavelet  analysis 

1.3  Detection 

Finally  we  briefly  consider  target  and  anomaly  detection, 
which  is  covered  in  more  detail  in  [6].  Detection  is  very 
similar  to  de-noising,  except  now  we  may  not  need  to 
reconstruct  the  uncorrupted  data.  It  is  therefore  often 
sufficient  to  record  the  position,  scale  and  amplitude  of 
the  wavelet  components,  and  so  an  inverse  to  the 
wavelet  transform  is  not  necessary.  This  gives  us  more 
flexibility  in  the  choice  of  the  wavelet  (or  other)  basis, 
not  just  in  the  shape  of  the  functions,  but  also  in  their 
spacing  in  positions  and  scale.  Typically  we  can  afford 
to  choose  a  higher  threshold  on  the  wavelet  coefficients, 
and  to  use  a  denser  pyramid,  thus  over-sampling  the 
wavelet  transform.  This  extra  processing,  discussed  in 
the  last  two  sections,  allows  better  target  discrimination, 
but  can  also  introduce  redundancy  in  the  representation 
of  the  target. 

2.  Fundamentals:  the  Continuous  Wavelet 
Transform 

This  section  introduces  wavelet  analysis  for  continuous 
functions,  where  the  concepts  of  translation  and  dilation 


are  clearest.  Sections  3-5  cover  the  analysis  of  discretely 
sampled  signals. 

2.1  Convolution 

The  wavelet  transform  is  essentially  a  multi-scale 
convolution  of  a  signal  with  a  filter,  called  the  analysing 
wavelet  or  mother  wavelet.  First  we  briefly  review 
single-scale  convolution.  Convolution  of  a  signal /with  a 
filter  g  is  defined  as  follows: 

h  =  f*g  ;  h(x)=  jf(u)g(x-u)du 

R" 

where  integration  is  over  the  space  on  which  the 
functions  /  and  g  are  defined.  If  /  is  a  function  of  one 
variable,  e.g.  an  acoustic  signal  is  a  function  of  time, 
then  so  must  be  the  filter  g,  and  the  integral  is  one¬ 
dimensional,  i.e.  on  the  real  line.  The  convolution  output 
h  is  also  a  function  of  one  variable:  x  is  a  scalar  quantity. 
For  image  processing  /  is  a  function  of  two  variables,  so 
x  and  u  are  vectors,  each  with  two  scalar  components. 
The  integral  is  two-dimensional,  i.e.  over  the  image 
plane.  Convolution  can  also  be  done  in  higher 
dimensions,  for  example  when  analysing  time-sequenced 
imagery  or  medical  tomography. 

In  both  cases  the  underlying  principle  is  the  same:  we 
take  a  filter  function  g,  reverse  it  in  space  or  time,  and 
slide  it  over  the  signal  /  over  all  positions  x,  which  is 
done  by  translating  g,  and  is  why  the  argument  of  g 
under  the  integral  is  x-u,  not  u.  For  images  the 
translation  is  a  vector,  allowing  the  filter  to  be  positioned 
anywhere  within  the  image.  The  value  of  the  convolution 
output  h  at  x  tells  us  how  the  signal  or  image  interacts 
with  the  filter  at  that  particular  position  x. 

Convolution  is  translation  invariant;  if  the  signal  is 
shifted,  then  the  convolution  output  is  shifted  by  the 
same  amount.  Thus  convolution  is  a  natural  precursor  to 
wavelet  analysis,  suitable  for  analysing  signals  with 
translation  invariance,  where  the  information  sought  is 
equally  likely  to  occur  at  any  position  (or  time). 
However,  convolution  does  not  treat  scales  on  an  equal 
footing;  if  the  signal  is  dilated,  the  convolution  output  is 
not  dilated  or  simply  related  in  any  other  way.  Table  1 
shows  some  simple  examples  of  convolution  filters. 

The  top  hat  is  a  local  average,  so  it  integrates  the  signal 
over  an  interval  of  unit  length,  and  the  output  of  h 
depends  on  the  starting  point.  This  will  be  good  at 
identifying  regions  in  the  signal  with  high  (or  low)  local 
average,  for  example  a  pulse,  but  will  also  respond  well 
to  signals  with  a  high  global  average,  for  example  a 
constant  non-zero  function.  Thus  it  will  be  good  at 
discriminating  pulses  from  the  background  so  long  as  the 
local  mean  of  the  latter  is  always  small,  which  requires 
the  background  to  be  uncorrelated  over  lengths 
comparable  to  the  scale  of  the  filter,  for  example  zero- 
mean  white  noise.  In  this  case  it  will  be  better  at  picking 
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up  pulses  of  approximately  unit  length  than  of  much 
smaller  or  longer  lengths  because  the  signal  to  noise 
ratio  is  higher.  This  is  the  principle  behind  matched 
filters.  The  important  point  is  that  the  effectiveness  of 
the  filter  depends  on  the  scale  of  the  object  it  is  trying  to 
detect. 

The  Gaussian  pulse  is  similar  to  the  top  hat,  but  being 
smoother  it  is  less  sensitive  to  high  frequencies,  and  thus 
better  at  picking  up  smoother  objects.  The  edge  detector 
is  rather  different,  as  it  only  responds  to  changes  or 
gradients  within  the  signal,  because  anything  constant  is 
cancelled  out  by  the  up  and  down  pulses:  the  filter  has 
zero-mean.  Thus  this  is  a  good  edge  detector,  especially 
if  the  background  is  highly  correlated,  e.g.  Brownian 
noise,  because,  looking  for  differences  only,  it  ignores 
highly  correlated  regions.  The  same  thing  goes  for  the 
Mexican  hat  or  Difference  of  Gaussian  (DoG)  filter, 
except  it  is  symmetrical,  and  responds  best  to  2-sided 
edges  (filaments  in  images).  Again,  there  is  a  scale- 
dependence;  in  Brownian  noise  these  filters  respond  best 
to  smooth  ramps  whose  width  is  approximately  unity. 

2.2  The  Wavelet  Transform 

The  wavelet  transform  removes  this  scale-dependence 
by  repeating  the  convolution  of  Equation  (1)  at  multiple 
scales,  producing  a  function  of  position  and  scale: 

w(x,s)=s  2  J/(w)g^y -jrfK  (2) 

so  that  the  filter  is  dilated  by  a  factor  s  as  well  as 
translated  by  an  offset  x.  The  power  of  scale  in  front  of 
the  integral  is  a  normalisation  factor  similar  to  the 


factors  involving  n  used  in  the  Fourier  transform.  One 
useful  property  of  this  normalisation  factor,  discussed  in 
Section  8,  is  that  the  expected  wavelet  transform  of 
white  noise  is  independent  of  scale.  Now  all  information 
is  treated  similarly,  regardless  of  position  and  scale.  Any 
translation  and  dilation  of  the  signal  or  image  will  result 
in  a  similar  translation  and  dilation  of  the  wavelet 
transform.  The  filter  g  is  known  as  the  analysing 
wavelet  or  mother  wavelet,  and  depending  on  its  shape 
(e.g.  Table  1),  the  wavelet  transform  will  be  good  at 
detecting  top  hats,  pulses  and  edges  at  all  positions  and 
scales. 


2.3  Inverse  Wavelet  Transform  and  Admissibility 

As  you  would  expect  for  a  useful  transform,  there  is  an 


inversion  formula: 

r .. 

f(x)=C~'  j  w(«,j)g 

x  —  U 

-  s 

R"+l 

v  5  , 

k  2  duds 


(3) 


where  C  is  a  normalisation  constant  given  by: 
C  =  2* 


(4) 


where  the  hat  denotes  a  Fourier  transform  and  co  is 
Fourier  frequency.  This  formula  is  analogous  to  the 
continuous  Fourier  transform  inverse,  in  that  both 
transforms  look  very  similar  to  their  inverses,  and 
indeed,  the  wavelet  inverse  is  easiest  to  derive  in  the 
Fourier  domain,  using  the  Fourier  inversion  theorem. 
The  wavelet  transform  inverse  is  more  powerful, 
because  it  works  for  a  large  family  of  mother  wavelets, 
in  fact  any  function  g  for  which  the  normalisation 


Table  1.  Example  convolution  filter  functions. 


Name 

Function 

Approximate  shape 

Top  hat 

Gaussian  pulse 

g{x)=t\ p(-*2) 

Simple  edge  detector 

g(*)=7-(exp(-*2)) 

ax 

Mexican  hat  (DOG) 

sM  =  ^r(ex  p(-*2)) 

dx 
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constant  C  is  finite,  whereas  the  continuous  Fourier 
transform  involves  convolution  with  complex 
exponentials  only.  The  finiteness  of  C  imposes  a 
significant  constraint  on  g  however,  called  the 
admissibility  condition,  in  particular  requiring  g  to  have 
zero  mean.  Thus  the  inversion  formula  (3)  does  not  work 
with  the  top  hat  and  Gaussian  pulse  functions  in  Table  1 . 
Many  practitioners  of  wavelet  analysis  require  the 
admissibility  condition  as  part  of  the  definition  of  a 
wavelet.  However,  the  wavelet  transform  (2)  still  has 
meaning,  and  translation  and  dilation  invariance,  even 
without  this  condition;  it  is  mainly  when  using  the 
inversion  formula  (3)  that  the  admissibility  condition  is 
required. 

3.  Discrete  Wavelets  and  Filter  Banks 

3.1  The  Effects  of  Sampling 

The  continuous  wavelet  transform  is  sound  theoretically, 
but  it  not  applicable  to  signal  and  image  analysis  with 
digital  computers,  which  require  discretely  sampled  data, 
discrete  filters,  and  where  integration  is  replaced  with 
finite  summation.  The  same  argument  applies  to  Fourier 
analysis,  which  is  why  in  practice  the  discrete  Fourier 
transform  is  used,  often  implemented  as  the  fast  Fourier 
transform  (FFT).  Similar  implementations  have  been 
developed  for  wavelet  analysis,  and  there  is  an  elegant 
relationship  between  the  continuous  and  discrete  cases, 
described  in  Section  4. 

We  require  discrete  equivalents  for  the  operations  shown 
in  Table  2.  Dilation  is  the  main  cause  of  difficulty,  and 
the  reason  for  various  complications  in  the  theory  of  the 
discrete  wavelet  transform,  because  downsampling  and 
upsampling  are  not  invertible  even  though  they  appear 
superficially  to  be  inverses  of  each  other.  It  is  true  that 
upsampling  followed  by  downsampling  is  the  identity, 
leaving  the  signal  unchanged,  but  if  these  operations  are 
applied  in  the  reverse  order  all  the  samples  whose  index 
k  is  not  divisible  by  p  are  set  to  zero,  and  thus 
information  is  lost. 


3.2  Filter  Banks  and  Perfect  Reconstruction 

We  need  to  avoid  losing  information,  otherwise  the 
discrete  wavelet  transform  will  not  be  invertible,  and  the 
signal  or  image  would  not  be  fully  represented.  For  this 
reason  it  is  necessary  to  apply  more  than  one  discrete 
filter  to  the  data,  in  fact  p  filters,  where  p  is  the 
resampling  factor.  Thus  discrete  wavelet  analysis 
involves  the  application  of  filter  banks.  Fig.  1  shows  the 
process,  involving  a  single  dilation  and  its  inverse,  in 
diagrammatic  form. 

Analysis  Synthesis 

— ® —© — iKH 

— £h7] - »[~Tp] — »  Transformed  — {Tpj - {f)] — * 

I  I  s'gna'  |  | 


Fig.  1 .  Signal  analysis  and  synthesis 


If  the  reconstructed  signal  coming  from  the  synthesis 
channel  is  identical,  barring  a  delay,  to  the  input  to  the 
analysis  channel,  the  filter  bank  is  called  a  perfect 
reconstruction  (PR)  filter  bank.  H*  are  called  analysis 
filters  and  F*  are  called  synthesis  filters.  Both  are 
discrete,  linear  and  translation  invariant  (to  a  resolution 
of  one  sample),  and  in  general  are  implemented 
recursively: 

h(n)=Y/a(k)h{n  -  k)+  £ b(l)f(n  “0 

*  '  (5) 

where  the  coefficients  a(k)  and  b(l)  are  finite  and  their 
number  defines  the  order  of  the  filter.  All  such  filters  can 
be  implemented  by  discrete  convolution,  where  there  are 
no  recursive  coefficients  a{k),  but  there  may  be  infinitely 
many  b{l).  The  latter  may  be  obtained  by  applying  the 
filter  to  a  delta  function,  or  impulse,  and  hence  are 


Table  2.  Continuous  and  discrete  operation  analogues 


Continuous 

Discrete 

Discrete  Formula 

Integration 

Summation 

k 

Translation 

Shift  to  left  or  right  by 
integer  p 

f(k)-*f(k-p) 

Dilation 

Downsampling  or 
upsampling  by  integer  factor 

P 

f(k)->  f{pk) 

\f(k/p)  k/P  inte§er 

[  0  otherwise 
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denoted  the  impulse  response.  Where  the  response  is 
infinite,  the  filter  is  known  as  infinite  impulse  response 
(HR),  otherwise  finite  impulse  response  (FIR). 

The  other  components  of  the  filter  bank  are  to  do  with 
resampling,  where  iP  denotes  downsampling,  and  TP 
denotes  upsampling,  both  by  a  factor  of  p.  Thus  this 
filter  bank  has  the  discrete  analogues  of  both 
convolution  and  dilation,  and  thus  contains  all  the 
ingredients  required  for  wavelet  analysis.  Most  (though 
not  all)  resampling  in  wavelet  analysis  is  by  a  factor  of 
p- 2,  because  like  the  fast  Fourier  transform  the  process 
is  most  efficient  this  way,  so  from  this  point  we  will 
assume  p=2. 

When  designing  filters,  including  PR  filters,  it  is 
convenient  to  use  the  Z-transform,  where  the  coefficients 
a  and  b  are  each  assembled  into  a  polynomial,  and  the 
transform  is  the  ratio  of  these  polynomials,  where  the 
independent  variable  is  conventionally  written  as  z.  In 
this  representation,  the  following  conditions  are 
necessary  and  sufficient  for  perfect  reconstruction: 

F0(z)H0(z)+Fi(z)Hl(z)=2z-‘ 
FQ(z)Ha(-z)+Fl(z)Hl(-z)=  0  (6) 

In  this  notation,  multiplication  is  equivalent  to 
composition  of  filters,  changing  the  sign  of  z  is 
equivalent  to  reversing  the  filter,  and  zl  is  equivalent  to  a 
delay  of  l  samples.  The  first  equation  ensures  zero 
distortion  and  the  second  prevents  aliasing. 

3.3  Multiple  Resolution:  Discrete  Wavelet 
Transform 

We  now  have  two  channels  in  our  analysis  and  synthesis 
(reconstruction)  filter  banks.  The  idea  is  to  decompose 
the  signal  into  components  at  multiple  resolutions,  with 
octave  dilation  factors  (in  general  powers  of  p).  We 
lower  the  resolution,  i.e.  shrink  the  signal  or  image,  by 
downsampling,  and  reconstruct  by  upsampling.  To  work 
with  multiple  resolutions,  we  apply  the  filter  bank  in  Fig. 

1  recursively: 


High  Resolution  Low  Resolution 


Fig.  2.  Multiresolution  Filter  Bank 


As  the  object  is  to  shrink  the  image,  it  is  conventional  to 
apply  a  smoothing  (lowpass)  filter,  to  avoid  aliasing.  By 
convention,  therefore,  H0  is  a  lowpass  filter  and  H:  is  a 
highpass  filter,  so  that  all  information  about  the  signal  or 
image  is  retained.  Eventually  we  are  left  with  just  a 
small  number  of  coefficients  at  the  lowest  resolution  (in 
the  limit  just  one),  and  a  pyramid  of  highpass  output 
values  at  multiple  resolutions,  each  resolution  (for  ID 
signals)  having  half  the  number  of  coefficients  of  the 
previous  resolution.  The  case  is  slightly  different  for 
higher  dimensions,  for  example  images,  covered  in 
Section  6.  This  representation  is  the  discrete  wavelet 
transform,  sometimes  called  a  pyramid,  because  the 
number  of  coefficients  decreases  at  each  new  level.  The 
reconstruction  of  the  signal  from  the  pyramid  is  also 
done  recursively  in  reverse  order. 

The  highpass  outputs  are  often  referred  to  as  detail 
coefficients,  because  they  effectively  siphon  information 
at  a  particular  resolution,  the  lowpass  coefficients  going 
to  the  next  level.  At  each  resolution,  the  combination  of 
a  number  of  iterations  of  H0,  followed  by  can  be 
regarded  as  a  bandpass  filter,  and  is  the  analogue  of 
convolution  of  the  signal  with  the  analysing  wavelet  at 
the  appropriate  scale. 

The  method  of  convolution  is  rather  different,  however. 
In  Equation  (2)  the  convolution  is  done  by  dilating  the 
filter  g  but  keeping  the  signal  /  fixed.  In  the  discrete 
implementation  the  filter  is  fixed  and  the  signal  is  dilated 
by  the  inverse  factor.  The  two  operations  are  equivalent 
in  the  continuous  case,  as  can  be  seen  by  substituting  uls 
for  u  in  Equation  (2),  but  in  the  discrete  case  they  are 
not,  because  upsampling  and  downsampling  are  not 
inverses. 

The  main  reason  for  shrinking  the  signal  rather  than 
expanding  the  filter  is  efficiency,  as  in  the  former  case 
the  computation  decreases  with  resolution,  whereas  in 
the  latter  case  the  computational  load  increases,  as  the 
filter  coefficients  increase  in  number.  However,  there  is 
a  drawback.  Downsampling  means  that  the  wavelet 
transform  is  evaluated  on  an  increasingly  sparse  grid  as 
the  resolution  decreases.  This  grid  is  prescribed,  so  the 
discrete  wavelet  transform  is  not  translation  invariant.  If 
we  shift  the  signal  by  k,  the  wavelet  transform  is 
translation-invariant  only  at  resolutions  which  divide 
into  k  perfectly.  Eventually  the  dilation  factor  will 
exceed  k,  and  so  translation  invariance  will  break  down. 
The  wavelet  transform  of  the  translated  signal  effectively 
“falls  between”  points  in  the  grid  above  a  certain  scale. 
The  discrete  wavelet  transform  is  only  scale-invariant  for 
octave  (powers  of  p)  scale  changes,  for  similar  reasons. 
This  failing  has  implications  for  detection  and 
classification  of  objects,  as  they  do  not  necessarily  result 
in  the  same  signature  in  the  discrete  wavelet  transform 
when  translated  or  dilated.  The  continuous  wavelet 
transform  does  not  have  these  disadvantages. 
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4.  The  Dilation  and  Wavelet  Equations  We  then  have  the  following  recurrence  relations  for  a 

and  b: 


4.1  Wavelets  and  Filter  Banks 


The  multiresolution  filter  bank  of  Fig.  2  is  essentially 
how  the  discrete  wavelet  transform  is  implemented,  but 
the  relationship  with  the  continuous  wavelet  transform  is 
rather  loose,  based  on  the  analogue  between  dilation  and 
resampling.  Under  certain  conditions  however,  described 
in  this  section,  there  is  a  much  stronger  link  between 
wavelets  and  filter  banks,  discovered  by  Stephane 
Mallat,  summarised  next.  It  is  based  on  a  construction 
which  allows  continuous  wavelet  transform  coefficients 
to  be  computed  using  multiresolution  filter  banks.  This  is 
done  using  two  equations:  the  dilation  equation: 

<p(x)=^2h0(kWx-k) 

*  (7) 


and  the  wavelet  equation: 

w(x)  =  2/i,(/:)^(2;t-k) 

k 


(8) 


where  h0  and  h\  are  the  impulse  response  coefficients  of 
the  filters  H0  and  Hi  respectively  (the  equivalent  of  the 
coefficients  b  in  Equation  (5)  if  the  coefficients  a  are  all 
zero).  <j)  is  called  the  scaling  function  and  w  is  the 
analysing  wavelet  which  has  the  same  role  as  g  in 
Equation  (2).  When  convolved  with  the  signal,  <p  acts  as 
a  lowpass  filter  (in  much  the  same  way  as  the  discrete 
filter  H0)  and  w  is  a  bandpass  filter  (analogous  to  Hi). 
Equations  (7)  and  (8)  allow  the  wavelet  transform  at  one 
scale  to  be  calculated  from  the  same  transform  at  half 
this  scale,  without  direct  convolution,  using  the  discrete 
filters  H0  and  Hi.  To  see  how  this  works,  substitute  <p  and 
w  for  g  in  equation  (2)  to  produce  two  functions  Wo(x,s) 
and  Wi(;c,s),  and  then  sample  these  functions  on  a 
discrete  grid  with  octave  scales  and  position  spacing 
proportional  to  scale  (pyramid  sampling): 


a(p,q)=w0(2\p) 

b(p,q)=wi(2\p) 


p,q  integers 


(9) 


a(p,q)=YJ'&K(k  -2p)p{k,q  +  \) 

k 

b(p, <l) -  X (k  -2p)a(l,q  + 1) 

'  (10) 

which  is  identical  to  using  the  coefficients  a(p,q)  as 
input  data  into  the  filter  bank  of  Fig.  1  to  produce 
a(p,q+ 1)  and  b(p,q+ 1)  as  outputs.  The  multiresolution 
filter  bank  of  Fig.  2  will  therefore  produce  the  values  of 
the  wavelet  transform  function  Wj  on  the  pyramid  grid, 
but  much  more  efficiently  than  by  direct  convolution. 

The  filter  bank  in  Fig.  2  works  because  the  scaling 
function  and  analysing  wavelet  are  carefully  designed  so 
that  these  functions  can  be  dilated  by  translation,  scalar 
multiplication  and  summation,  using  equations  (7)  and 
(8).  This  is  a  delicate  process,  as  we  require  linear 
combinations  of  the  function  <j)  and  a  number  of 
translated  replicas  to  combine  to  produce  the  same 
profile,  but  dilated.  It  is  a  bit  like  a  self-similar  jigsaw 
puzzle:  the  jigsaw  pieces  at  one  scale  have  to  fit  together 
perfectly  to  produce  the  same  jigsaw  piece  doubled  in 
size. 


4.2  Haar  Wavelets 

We  demonstrate  the  use  of  the  scaling  and  wavelet 
equations  with  the  Haar  scaling  function  and  wavelet, 
which  until  the  1980’s  was  the  only  example  of  a 
function  of  compact  support  known  to  solve  these 
equations.  We  begin  with  the  very  crude  lowpass  and 
highpass  filters  Ho  =  [1/2, 1/2]  and  H]  =  [1/2, -1/2].  We 
now  have  the  following  dilation  and  wavelet  equations: 

<t>{x)~  0(2x)+0(2x-l) 
w{x)=<p(2x)-<p(2x-\)  (tl) 

which  have  the  Haar  scaling  function  and  wavelet  as 
solutions,  shown  in  Table  3. 

In  this  simple  case  it  is  obvious  how  the  Haar  scaling 
function  (top  hat)  solves  the  dilation  equation,  as  the 


Table  3.  Haar  scaling  function  and  wavelet 


Name 

Function 

Shape 

Haar  scaling  function 

<p{x)= 

'\  0  <  x  <  1 

[0  otherwise 

Haar  wavelet 

<p{x)=- 

'l  0  <  x  < 

-1  -^<X<1 

0  otherwise 
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summands  on  the  right  hand  side  do  not  overlap,  but  the 
existence  of  other,  less  trivial  solutions  (with  different 
filters  H0  and  Hi)  which  are  smooth  and  do  overlap  is 
much  more  interesting  and  useful.  The  Haar  wavelet  has 
some  good  properties;  it  is  very  compact,  has  a  very 
simple  two-point  filter  and  its  pyramid  translations  and 
dilations  form  an  orthogonal  basis.  It  is  not  smooth, 
however,  and  thus  has  very  poor  localisation  of  Fourier 
frequency.  A  major  advance  was  made  in  the  mid  1980’s 
by  Ingrid  Daubechies,  who  discovered  a  family  of 
smooth  wavelets  which  also  solve  the  dilation  and 
wavelet  equations.  These  are  now  widely  used  in  signal 
and  image  compression. 

4.3  Existence  and  Construction  of  Wavelets 


The  obvious  remaining  issue  is  knowing  when  there  are 
solutions  to  the  dilation  and  wavelet  equations,  are  how 
to  find  them.  Existence  and  uniqueness  depend  on  the 
following  Toeplitz  matrix: 


T{H0)= 


fh0( 1)  h0( 2)  h0( 3)  hQ{ 4)  ' 
A0(l)  ha{ 2)  h0{ 3) 
*o(0  *o(2) 

fco(l) 


'  J 


(12) 


derived  from  the  lowpass  filter  H0.  We  derive  another 
matrix,  called  the  transition  matrix: 


T0=2T{(l2)Hoy(HQ) 


(13) 


The  dilation  and  wavelet  equations  have  a  unique 
solution  if  the  eigenvalues  of  the  transition  matrix  are 
less  than  unity,  except  for  a  single  eigenvector  with  unit 
eigenvalue.  Moreover,  when  this  happens  we  have  a 
simple  recursive  recipe  for  calculating  the  scaling 
function  <j>\  it  is  the  limit  of  the  following  convergent 
sequences  of  functions: 

<t>n+\  (*)  =  X  2K  (* k  (2 x-k) 

(14) 

whose  resemblance  to  the  dilation  equation  (7)  is 
obvious.  The  wavelet  function  can  be  derived  directly 
from  the  scaling  function  using  equation  (8). 


To  summarise,  we  have  achieved  a  huge  gain  in 
efficiency  by  calculating  the  wavelet  transform  using  a 
discrete  multiresolution  filter  bank,  but  at  a  price,  as  we 
have  imposed  a  constraint  on  the  wavelet  function  w  in 
the  form  of  the  dilation  and  wavelet  equations.  For  many 
applications  the  shape  of  the  wavelet  is  not  critical,  as 
long  as  it  has  the  required  compactness  in  space  or 
frequency,  but  there  are  some  applications,  for  example 
target  detection,  where  the  shape  is  more  important.  We 
have  also  constrained  the  evaluation  of  the  wavelet 


transform  to  a  discrete  pyramid  grid,  which  is  also  be 
unsuitable  for  applications  where  translation  and  scale 
invariance  are  important. 


5.  Wavelet  Varieties 

As  with  filter  design,  there  are  various,  sometimes 
conflicting  requirements  of  wavelet  analysis,  so  there  are 
different  types  of  wavelets  which  are  suitable  for 
different  applications,  discussed  in  this  section. 

Although  the  scaling  and  wavelet  functions  are  uniquely 
determined  by  Equations  (7)  and  (8),  they  can  still  be 
controlled  by  the  coefficients  of  the  filter  H0.  The  typical 
approach  to  wavelet  design,  therefore,  is  to  design  this 
filter  first,  along  with  Hi.  The  reconstruction  filter  is 
then  derived  from  the  perfect  reconstruction  equations 
(6),  which  provides  the  inverse  to  the  discrete  wavelet 
transform.  In  this  section  we  review  briefly  some  of  the 
many  varieties  of  wavelets  and  filter  banks  that  are 
available  for  ID  signal  analysis.  Higher-dimensional 
signals,  including  images,  are  considered  in  Section  6. 

5.1  Orthonormal  Wavelets 

The  most  well  known  type  of  wavelet  are  the 
orthonormal  wavelets  discovered  by  Ingrid 
Daubechies.  Here  the  filter  H(l  is  designed  such  that  the 
analysing  wavelet  and  all  its  translations  and  dilations  on 
the  pyramid  grid  are  mutually  orthogonal  and  have  unit 
energy: 

f  wfe1  x  +  k\’(^'lx+  p)dx  =  <  ^  ;  k,l,p,q  integers 

0  otherwise 

(15) 

Orthonormal  functions  are  liked  by  mathematicians 
because  transforms  which  use  these  functions  are  very 
stable,  and  trivial  to  invert,  so  reconstruction  of  the 
signal  or  image  is  very  easy  and  efficient.  In  the  case  of 
the  wavelet  transform  the  inverse  is  given  by: 

/(•*)=  x+  p)+Ha(P’lJo)w{2‘h  x+  p) 

P>Q<%  P 

(16) 

so  the  wavelet  coefficients  on  the  pyramid  are  the 
weighting  factors  required  to  reconstruct  the  signal  or 
image  /.  A  necessary  and  sufficient  condition  for 
orthonormal  wavelets  is  that  the  filter  H0  is  double-shift 
orthogonal,  which  means  that  when  convolved  with  its 
transpose,  all  the  even  coefficients  are  zero  except  at 
zero,  where  the  coefficient  is  two.  The  odd  coefficients 
do  not  affect  orthogonality  of  the  wavelets.  In  the 
Fourier  domain  these  filters  are  known  as  half-band, 
because  the  power  spectrum  added  to  a  mirror  image 
about  half  the  Nyquist  adds  to  unity  at  all  frequencies: 
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(l7) 

The  highpass  filter  Hi  is  thus  derived  from  H0  by 
changing  the  signs  of  the  odd  coefficients  and  then 
transposing.  The  synthesis  part  of  the  filter  bank  is 
identical  to  the  analysis  part  except  for  a  transpose:  h0(k) 
=  fo(-k)  and  h,(k)  =  f,(-k). 


The  remaining  task  is  to  design  the  coefficients  of  H()  to 
satisfy  Equation  (17).  This  is  a  complicated  process,  so 
only  an  outline  of  one  method  to  derive  orthogonal 
wavelets  will  be  given  here.  First  a  power  spectrum 

|//(t»)|2is  found  satisfying  Equation  (17),  which  for  FIR 


filters  means  finding  a  finite  symmetric  polynomial 
satisfying: 


/>(*)+ /’(I-*)  =1  (lg) 

but  where  for  smooth,  band-limited  wavelets  it  is  also 
desirable  to  have  P  and  as  many  derivatives  as  possible 
zero  at  x=0  and  jc=1,  except  P(0)=1 .  The  family  of 
solutions,  called  maxflat  filters,  is  given  by: 

4=0  V  *  )  (19) 

Next  the  coefficients  of  H0  are  derived  from  P;  P  is  the 
autocorrelation  of  H0: 


(b)  p=8 


Orthonormal  wavelets  and  filter  banks  are  very 
convenient,  but  the  constraint  imposed  by  Equation  (17) 
is  very  restrictive.  For  example,  except  for  the  trivial 
case  of  the  Haar  scaling  function,  none  of  the  scaling 
functions,  wavelets  or  filters  are  symmetrical.  It  is 
tempting  to  use  orthogonal  wavelets  because  of  their 
simple  inversion  formula,  but  in  many  cases  this  is 
unnecessary,  as  we  often  do  not  require  the  same 
coefficients  for  the  analysis  and  synthesis  filters.  An 
analogue  is  the  use  of  matrices  to  solve  simultaneous 
linear  equations.  A  matrix  with  a  simple,  sparse  inverse 
permits  us  to  solve  simultaneous  equations  easily,  but 
efficient  inversion  does  not  require  the  additional 
constraint  of  the  inverse  being  equal  to  the  transpose,  as 
required  of  orthogonal  matrices. 

5.2  Biorthogonal  and  Semi-orthogonal  Wavelets 


P(z)=Ha(z)H0{z-1)  (2Q) 

and  solving  this  equation  is  known  as  spectral 
factorisation.  One  method  is  to  find  all  the  complex 
roots  of  P,  which  because  it  is  real  and  symmetric,  has 
roots  which  come  in  pairs  which  are  mutually  reciprocal. 
The  polynomial  H0  is  derived  by  gathering  together  one 
root  from  each  pair  whose  modulus  is  less  than  or  equal 
to  unity.  Fig.  3  shows  the  Daubechies’  wavelets  with 
p- 5  and  p= 8,  which  become  smoother  and  more  band- 
limited  with  higher  p. 


Fig.  3.  Daubechies’  wavelets  DB5  and  DB8 


Orthonormal  wavelets  are  the  analogue  of  orthogonal 
matrices.  Likewise  biorthogonal  wavelets  are  the 
analogue  of  invertible  matrices.  The  inverse  of  the  filter 
bank  is  perfect  reconstruction,  so  we  still  require 
Equations  (6)  to  be  solved,  but  now  the  synthesis  filters 
F0  and  Fj  can  be  very  different  to  the  analysis  filters  H0 
and  H|.  We  also  have  to  work  with  two  types  of  scaling 
and  wavelet  function:  one  pair  for  analysis,  to  calculate 
the  wavelet  coefficients  using  Equation  (2),  and  a 
different  pair  for  synthesis,  to  reconstruct  the  signal  or 
image,  using  Equation  (16).  The  wavelets  and  filter 
banks  are  still  related  by  the  dilation  and  wavelet 
equations  (7,8),  but  now  the  analysis  functions  are 
generated  by  the  analysis  filters  H0  and  Hi,  and  the 
synthesis  wavelets  are  generated  by  the  synthesis  filters 
F()  and  Fj.  The  perfect  reconstruction  equations  (6) 
ensure  that  these  wavelets  are  biorthogonal,  which 
means  that  in  Equation  (15)  one  of  the  wavelets  in  the 
integrand  is  an  analysis  wavelet,  and  the  other  is  a 
synthesis  wavelet,  but  otherwise  the  formula  is  the  same. 

Semi-orthogonal  wavelets  are  another  important 
variety,  where  wavelets  of  different  scales  are 
orthogonal,  but  wavelets  of  different  position  are  not 
always  orthogonal.  These  are  useful  for  interpolation 
and  approximation  of  functions.  A  popular  a  simple 
choice  are  the  spline  wavelets,  whose  scaling  functions 
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are  the  Haar  scaling  function  (top  hat)  convolved  with 
itself  n  times,  and  whose  lowpass  filter  has  binomial 
coefficients.  The  orthogonality  across  scale  ensures  that 
the  accuracy  of  approximation  for  smooth  functions 
increases  with  maximum  rapidity  as  scale  decreases,  but 
orthogonality  between  wavelets  of  the  same  scale 
imposes  undesirable  constraints  which  degrade 
approximation 

A  useful  tool  which  has  gained  a  lot  of  attention  recently 
is  lifting,  which  is  a  systematic  and  flexible  method  of 
constructing  biorthogonal  wavelets  and  filter  banks.  The 
idea  is  to  change  H0  to  meet  application-specific  design 
requirements,  whilst  still  satisfying  the  perfect 
reconstruction  of  Equation  (6).  It  turns  out  that  any 
change  to  H0  of  the  following  form  will  achieve  this: 

/f0(z)->//0(z)+F0(-z)s(z2)  (21) 

for  any  filter  S(z).  We  can  do  a  similar  operation  on  the 
synthesis  filter  F0,  which  is  called  dual  lifting.  Typically 
the  process  of  filter  design  starts  with  a  simple  filter,  for 
example  a  delta  function  or  top  hat,  called  a  “Lazy 
filter”,  and  then  the  processes  of  lifting  and  dual  lifting 
are  iterated  with  suitable  choices  of  S,  until  the  design 
requirements  are  met. 

5.3  Wavelet  Frames 

Lastly,  we  briefly  mention  wavelet  frames.  The  discrete 
wavelet  transform  and  filter  banks  mentioned  so  far  are 
fully  invertible  transforms,  so  there  is  a  one-to-one 
correspondence  between  the  signal  and  the  output  of  the 
wavelet  transform  or  filter  bank.  This  is  equivalent  to  the 
translations  and  dilations  of  the  mother  wavelet  on  the 
pyramid  grid  being  a  basis;  they  are  linearly  independent 
and  span  the  space  of  signals.  In  wavelet  frames  the 
requirement  for  independence  is  dropped,  which 
typically  involves  oversampling  the  continuous  wavelet 
transform  by  adding  extra  points  to  the  pyramid,  for 
example  by  doubling  the  resolution  in  position  or  by 
halving  scales  between  octaves.  The  wavelets  still  span 
the  signal  space,  so  any  signal  can  be  recovered  from  the 
wavelet  transform.  Not  all  such  functions  of  position  and 
scale  are  wavelet  transforms,  however,  so  wavelet  frame 
transforms  only  have  one-sided  inverses. 

Wavelet  frames  are  generally  more  computationally 
intensive,  as  there  are  additional  coefficients  to  calculate, 
but  objects  such  as  targets  can  be  characterised  more 
flexibly  at  intermediate  positions  and  scales.  Wavelet 
frames  become  more  translation-invariant  as  the 
sampling  density  increases,  as  they  are  better 
approximations  to  the  continuous  wavelet  transform. 

To  summarise,  there  is  a  wide  variety  of  wavelets  and 
filter  banks  available  for  signal  analysis,  each  with  its 
own  strengths  and  weaknesses.  Although  it  is  tempting 
to  use  the  first  family  of  wavelets  that  springs  to  mind, 


for  example  the  popular  Daubechies  wavelets,  there  may 
be  others  more  suitable  for  the  application.  There  are 
also  design  techniques,  such  as  lifting,  to  customise 
wavelets,  should  off-the-shelf  varieties  not  suffice. 

6.  Wavelet  Analysis  in  Higher  Dimensions 

The  techniques  described  in  Sections  3-5  are  applicable 
to  1 -dimensional  signals.  In  higher  dimensions  there  are 
two  approaches  to  wavelet  analysis:  either  to  use 
separable  filters  which  can  be  derived  easily  from  ID 
filters  using  exterior  products,  or  non-separable  filters, 
which  have  to  be  designed  from  scratch,  which  is  more 
difficult. 

6.1  Separable  Wavelets 

Separable  functions  of  several  variables  are  Cartesian 
products  of  functions  of  fewer  variables: 

/  fo  ,  *2  ,-,•*„)=  /,  (*1  )f  2  (*2  )' '  '  fn  (*„  )  (22) 

where  in  general  the  arguments  xk  can  be  vectors  as  well 
as  scalars.  Exterior  products  of  scaling  functions  and 
wavelets  make  effective  higher-dimensional  wavelets, 
inheriting  all  the  properties  of  their  lower-dimensional 
components.  To  simplify  the  notation,  we  will  consider 
exterior  products  of  two  1 -dimensional  wavelets  to 
facilitate  image  wavelet  analysis,  but  the  principles 
behind  higher-dimensional  wavelet  analysis  are 
identical. 

Image  wavelet  analysis  involves  one  lowpass  filter  H0 
and  three  highpass  filters,  H];  H2,  H3,  each  of  which  is 
the  exterior  product  of  1 -dimensional  lowpass  or 
highpass  filters: 

h„  (m,  n)=  ha  (m)h0  in)  \  (m,n)=  h0  (m^  (n)  ^3) 
h2(m,n)=  hi  (m)ha(n)  h3(m,n)=  h3 (m)h[ (n ) 

Similarly  there  is  one  scaling  function  and  three  wavelet 
functions  formed  as  exterior  products  of  their  1- 
dimensional  counterparts: 

<t,(x>y)=</>(x)/(y)  <t>\{x,y)=<t>{x)w{y) 

<f>2(x,  y)=w(x)p(y)  03  (x,  y)=  w(x)w(y)  (24) 

The  multiresolution  filter  bank  has  four  outputs  at  each 
scale;  the  lowpass  output  is  downsampled  and  goes  to 
the  next  resolution,  and  the  other  3  outputs  are  the  detail 
or  wavelet  coefficients,  as  for  the  1 -dimensional  case. 
The  three  types  of  wavelet  are  usually  regarded  as 
having  horizontal,  vertical  and  diagonal  orientation. 

The  discrete  wavelet  transform  is  usually  displayed  as 
shown  in  Fig.  4,  though  this  representation  can  be 
misleading.  In  this  representation  the  density  of  wavelet 
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coefficients  is  kept  constant,  with  larger  regions  required 
to  store  information  at  high  resolution  (low  scale).  The 
wavelet  coefficients  at  any  scale  are  three  times  the 
number  at  all  larger  scales,  because  there  are  three 
highpass  filters  to  one  lowpass  filter.  The  regions  are 
designated  LL  (lowest  resolution  only)  HL,  LH  and  HH 
according  to  which  combination  of  1 -dimensional  filters 
is  used  in  the  Cartesian  product.  This  representation  is 
convenient,  because  the  transform  has  the  same  shape 
and  number  of  pixels  as  the  original  image,  an  example 
shown  in  Fig.  5,  but  the  larger  scales  are  portrayed  as 
being  smaller  in  size!  It  is  true  that  the  downsampling 
operator  has  this  effect,  but  a  more  natural  interpretation 
is  that  the  wavelet  filters  increase  in  scale. 


LH 

HH 

LH 

HH 

HL 

LH 

HH 

HL 

LL 

HL 

Fig.  4.  Image  wavelet  display 


Fig.  5.  Wavelet  decomposition  of  ‘Lena’  -  2  levels 


6.2  Non-Separable  Wavelets 


The  alternative  approach  to  image  wavelet  analysis  is  to 
use  non-separable  wavelets.  Although  more  difficult  to 
design,  these  can  be  more  flexible,  especially  in 
orientation.  The  image  pyramid  grids  and  resampling  do 
not  need  to  be  rectangular  or  separable,  either.  An 
example  is  given  in  Fig.  6,  where  the  small  and  large 
dots  comprise  the  grid  at  one  resolution,  and  the  large 
dots  only  comprise  a  sub-grid  at  the  next  lowest 
resolution. 


•  •  •  I  • 

*  •  •  •  • 

i  |  i  |  • 

•  •  •  •  « 

•  |  t  |  • 


Fig.  6.  Non-separable  grid 


In  this  example  the  change  in  area  and  the  resampling 
factor  between  scales  is  not  4  as  it  would  be  in  the 
separable  case,  but  2,  so  there  is  only  one  highpass  filter 
required,  as  for  the  1 -dimensional  signal  case.  In  this 
case  resampling  causes  a  rotation  through  45°.  This  is 
known  as  quincunx  resampling.  Hexagonal  grids  can 
also  be  used,  which  permits  wavelets  with  60° 
orientation  intervals  to  be  constructed.  Even  more  exotic 
wavelet  grids  have  become  popular  in  the  interpolation 
of  complex  geometric  surfaces  [7],  which  is  a  very 
active  research  topic. 

7.  Wavelet  Packets 

In  conventional  wavelet  analysis  the  main  source  of 
variety  in  the  transform  is  in  translation  and  dilation. 
One  or  a  very  small  number  of  filters  is  involved,  except 
for  differences  in  position  and  scale.  This  limits  the 
variety  of  information  that  individual  wavelet 
coefficients  represent.  Another  approach  that  has  gained 
popularity  in  recent  years  is  that  of  wavelet  packets, 
where  the  functions  used  to  represent  the  signal  or  image 
vary  in  shape  also.  Typically  frames  are  used  instead  of 
bases,  initially  providing  redundancy,  but  then  a  subset 
of  the  coefficients  are  selected  to  derive  a  basis  which  is 
adapted  to  fit  the  incoming  data. 

One  way  to  do  this  is  the  extend  the  sub-band  coding  to 
encompass  any  dyadic  tree  structure.  In  conventional 
wavelet  analysis  it  is  only  the  lowpass  filter  that  is  split 
further  by  downsampling  and  bandpass  filtering;  the 
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output  from  the  highpass  filter  is  left  alone.  In  a  more 
general  dyadic  tree,  the  decision  to  split  the  channel  is 
applied  more  arbitrarily,  to  yield  a  wide  variety  of 
transforms.  Fig.  7  shows  some  examples  of  dyadic  trees. 


Lowpass 

rC  rC 


Wavelet  tree 


Complete  tree 


Wavelet 
packet  tree 


selected  from  a  large  dictionary  (e.g.  generated  by 
translation,  dilation  and  frequency  shifts)  in  the  order 
that  most  rapidly  decreases  the  approximation  error,  and 
at  each  stage  subtracts  the  chosen  function  from  the 
signal  or  image.  Another  method  [12]  is  selection  from  a 
continuum  of  functions  analogous  to  the  continuous 
wavelet  transform,  searching  for  local  maxima  in 
correlation  with  the  signal  or  image,  but  where 
parameters  are  not  limited  to  position  and  scale,  or  to  a 
discrete  grid.  A  conjugate  gradient  search  is  used  to 
refine  the  wavelet  parameters  after  an  initial  grid  search, 
enabling  the  wavelets  to  fit  the  signal  or  image  data 
more  accurately,  and  achieving  invariance  with  respect 
to  translation,  dilation  and  related  operations. 


9.  References 


Fig.  7.  Dyadic  tree  structures 

The  complete  dyadic  tree  divides  all  branches,  resulting 
in  an  equal  partition  in  the  Fourier  domain,  analogous  to 
the  short-time  Fourier  transform  which  divides  the  signal 
into  a  set  of  time-frequency  cells  of  identical  duration 
and  frequency  bandwidth.  If  we  apply  the  complete  tree 
to  the  Haar  filter,  for  example,  we  get  the  Walsh 
functions,  shown  in  Fig.  8. 


Fig.  8.  Walsh  functions 

In  general  the  aim  of  wavelet  packet  analysis  is  to 
approximate  the  signal  or  image  by  a  series  of  functions 
chosen  from  a  large  set,  called  a  dictionary,  for  example 
the  functions  generated  by  all  dyadic  trees.  The  functions 
are  chosen  to  give  the  best  approximation  with  the 
smallest  number  of  components.  The  larger  the 
dictionary,  the  more  computation  required,  but  also  the 
greater  the  potential  for  an  efficient  representation.  An 
alternative  approach  is  to  extend  the  transformations 
which  generate  the  wavelet  basis  beyond  translation  and 
dilation  to  include  shape  changes,  for  example  frequency 
shifts  and  chirp  angles  (chirplets  [8]),  or  in  the  case  of 
image  wavelets,  affine  transformations  (ridgelets  [9]). 

There  are  also  different  approaches  to  choosing  the 
functions  from  the  dictionary  to  approximate  the  signal 
or  image.  One  method  is  the  best  basis  algorithm  [10] 
which  selects  functions  from  a  union  of  several  bases. 
Another  is  matching  pursuit  [11],  where  wavelets  are 


1.  G.  Strang  and  T.  Nguyen,  Wavelets  and  Filter 
Banks,  Wellesley-Cambridge  Press,  Rev.  Ed.,  1997. 

2.  Y.  Meyer,  Wavelets,  Algorithms  and  Applications, 
Siam,  Philadelphia,  1993. 

3.  Feder  J.,  Fractals,  Plenum  Press,  1988. 

4.  A.KJain,  “Image  Data  Compression:  A  Review”, 
Proc.  IEEE,  69,  pp.349-389,  1981. 

5.  T.H.  Koornwinder  (ed),  Wavelets:  An  Elementary 
Treatment  of  Theory  and  Applications,  World 
Scientific,  1993. 

6.  G.H.Watson,  “The  Detection  of  Unusual  Events  in 
Cluttered  Natural  Backgrounds”,  NATO  RTA  lecture 
series  216,  Application  of  Mathematical  Signal 
Processing  Techniques  to  Mission  Systems,  1999. 

7.  A.W.F.Lee  et  al,  “MAPS:  Multiresolution  Adaptive 
Parameterisation  of  Surfaces”,  Computer  Graphics 
Proceedings  (SIGGRAPH  98),  pp.95-104,  1998. 

8.  S.Mann  and  S.Haykin,  “The  Chirplet  Transform: 
Physical  Considerations”,  IEEE  Trans,  on  Signal 
Processing,  43(11),  Nov  1995. 

9.  EJ.Candes,  “Ridgelets:  Theory  and  Applications”, 
PhD  Thesis,  Dept  of  Statistics,  Stamford  University, 
1998. 

10.  R.R.Coifman  and  M.V.Wickerhauser,  “Entropy- 
Based  Algorithms  for  Best  Basis  Selection”,  IEEE 
Trans,  on  Information  Theory,  38,  pp.713-8,  1992. 

11.  S.G.Mallat,  “A  Theory  of  Multi -Resolution  Signal 
Decomposition:  Wavelet  Decomposition”,  IEEE 
PAM  I.,  Vol.  1,  pp.  674-693,  1989. 

12.  G.H.Watson  and  K.Gilholm,  “Signal  and  image 
feature  extraction  from  local  maxima  of  generalised 
correlation”,  Pattern  Recognition  31(11)  pp.1733- 
1745,  Nov  1998. 

©  British  Crown  copyright  1999.  Published  with  the 

permission  of  the  Defence  Evaluation  and  Research 

Agency  on  behalf  of  the  Controller  of  HMSO. 


2-1 


The  Detection  of  Unusual  Events  in  Cluttered  Natural  Backgrounds 

G.H.Watson 

Room  1052,  A2  Building,  DERA  Farnborough,  Ively  Road,  Farnborough,  Hants,  GU14  OLX,  UK 


1.  Introduction 

This  paper  is  concerned  with  the  use  of  wavelet  analysis 
and  statistical  models  of  natural  backgrounds  as  a  means 
of  detecting  unusual  events  within,  in  particular  targets 
of  military  interest.  The  underlying  principle  is  to  detect 
targets  as  objects  that  stand  out  from  the  background, 
and  hence  are  unusual,  rather  than  searching  for  objects 
with  prescribed  characteristics  and  dealing  with  clutter  as 
an  afterthought.  First  a  method  of  feature  extraction  is 
described  based  on  wavelet  analysis  which  is  used  to 
characterise  both  backgrounds  and  unusual  events.  Then 
the  statistics  of  these  features  for  natural  backgrounds  are 
considered,  making  use  of  fractal  geometry,  from  which 
basic  clutter  rejection  can  be  implemented.  More 
advanced  clutter  rejection  methods  are  then  considered, 
based  on  the  multivariate  statistics  of  additional 
measurements.  Three  cases  are  considered  in  detail:  the 
wavelet  analysis  of  multispectral  data,  the  use  of  local 
variance  to  reject  clutter  in  intermittent  backgrounds,  and 
the  use  of  temporal  variability  to  reject  clutter  in  image 
sequences. 

The  approach  of  modelling  the  background,  rather  than 
the  target,  has  the  advantage  that  little  or  no  prior 
knowledge  of  the  latter  is  required,  leading  to  greater 
flexibility  and  robustness.  Target  prior  can  be  added  at  a 
later  stage,  if  available,  for  further  discrimination  and 
clutter  rejection.  In  some  military  circumstances  early 
warning  of  targets  is  required  before  any  detailed 
structure  can  be  resolved,  the  limiting  case  being  point 
targets  with  a  single-pixel  signature.  In  such  cases  target 
prior  is  of  little  use  in  recognition,  being  limited  to  the 
time  signature  of  a  single  pixel,  so  the  use  of  background 
context  can  be  critical  to  early  detection. 

The  method  to  be  described  comprises  the  following 
stages: 

(a)  Decompose  the  signal  or  image  data  into  a  set  of 
discrete  features  which  are  suitable  as  an  ensemble 
for  characterising  both  targets  and  the  background. 
These  features  are  generally  simple  geometric 
shapes  to  facilitate  their  extraction,  such  as  blobs 
and  bars  in  images,  but  which  can  be  combined  to 
characterise  more  complex  objects,  such  as  roads 
and  cloud  edges.  These  features  are  usually 
extracted  at  multiple  scales,  using  wavelet  analysis. 


(b)  Construct  a  statistical  model  of  the  background 
based  on  the  above  feature  decomposition.  Most 
natural  backgrounds  are  difficult  to  model,  having 
strongly  non-Gaussian  statistics  and  phase 
correlation,  for  example  in  the  form  of  strong  edges. 
In  general  the  joint  statistics  of  feature  parameters 
such  as  brightness,  position,  scale  and  orientation 
need  to  be  calibrated,  resulting  in  multidimensional 
probability  distributions.  However,  most  natural 
backgrounds  are  stationary  and  exhibit  fractal 
geometry,  which  simplifies  the  statistical  modelling. 

(c)  Extract  potential  targets  as  statistical  outliers,  that  is 
at  the  edges  or  tails  of  the  background  distribution. 
Each  object  is  assigned  a  prior  probability  that  it 
belongs  to  the  background,  and  hence  not  a  target. 

(d)  If  additional  target  prior  is  available,  use  Bayes’ 
formula  to  combine  the  prior  distributions  of  targets 
and  the  background  to  estimate  the  a  posteriori 
probability  (likelihood)  of  there  being  a  target.  This 
topic  is  not  covered  in  this  paper. 

This  method  can  be  applied  to  a  wide  variety  of  data, 
including  1 -dimensional  signals  (e.g.  acoustic  data),  2- 
dimensional  images  (including  multispectral  imagery),  3- 
dimensional  images  (e.g.  medical  tomography),  and 
time-sequenced  imagery,  where  movement  is  part  of  the 
feature  characterisation.  The  method  is  only  limited  by 
the  methods  of  feature  extraction  available,  and  the 
accuracy  of  the  background  statistical  models. 

The  remainder  of  this  paper  is  organised  as  follows. 
Section  2  describes  methods  of  feature  extraction,  based 
on  searching  for  local  extrema  in  the  wavelet  transform 
and  analogous  correlation,  and  explains  the  relationship 
between  this  and  matched  filtering.  Section  3  describes 
how  the  statistics  of  these  wavelet  features  are  calibrated 
with  the  aid  of  fractal  geometry.  Section  4  explains  how 
improved  clutter  rejection  can  be  implemented  by 
introducing  additional  random  variables,  and  gives  three 
examples:  multispectral  imagery,  strongly  intermittent 
backgrounds,  and  image  sequence  clutter  rejection  based 
on  space-time  filters. 


2.  Feature  Extraction 

As  explained  in  Section  1,  the  purpose  of  feature 
extraction  is  to  decompose  the  signal  or  image  into  a  set 
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of  discrete  geometric  components  which  are  sufficient  to 
characterise  both  targets  and  background  sufficiently 
well  that  the  former  can  be  recognised  as  unusual  events. 
This  involves  making  measurements  of  the  data  for 
which  targets  might  have  unusual  values,  which  in  signal 
processing  parlance  means  applying  filters  to  the  data. 
The  most  well  understood  filters  are  linear,  which  will  be 
considered  in  this  section;  Sections  4.2  and  4.3  give 
examples  of  non-linear  filters  which  provide  further 
discrimination  of  targets  and  clutter. 


2.1  Features  from  Matched  Filters 

Matched  filter  theory  [1,2]  can  be  used  to  derive  the 
optimum  linear  filter  (matched  filter)  to  detect  any 
prescribed  object,  in  the  sense  that  signal  to  noise  ratio 
(SNR)  is  maximised.  If/ is  the  filter,  x  is  the  target  then 
SNR  is  defined  to  be: 


\m 

a/e  Wf) 


where  E  denotes  expectation.  In  this  section  it  will  also 
be  assumed  that  the  signal  or  image  data  is  stationary, 
that  is  statistically  translation-invariant.  This  is  usually 
the  case  for  time-varying  signals  and  imagery  projected 
as  a  plan  view,  but  there  is  often  a  statistical  dependence 
on  the  vertical  image  co-ordinate  for  forward-looking 
imagery.  The  latter  situation  is  considered  in  Section  4.3. 
For  translation-invariant  data  the  filters  should  be 
translation-invariant  to  avoid  statistical  bias,  which 
implies  linear  filtering  is  equivalent  to  convolution.  In 
such  cases  the  Fourier  transform  of  the  matched  filter 
F{co)  is  given  by: 


F(co)=^\  (2) 

v  '  N(co) 

where  X  is  the  Fourier  transform  of  the  target,  the  bar 
denotes  complex  conjugation,  and  N  is  the  power 
spectral  density  (PSD)  of  the  background.  The  inverse 
Fourier  transform  can  be  used  to  derive  the  convolution 
kernel  in  the  signal  or  image  space. 

Matched  filters  are  simple  and  effective  when  the  target 
configuration  is  simple,  for  example  a  point  target,  as  the 
number  of  possible  target  configurations  may  be  small. 
Where  the  target  is  more  complex,  either  because  spatial 
structure  can  be  resolved  or  because  its  trajectory  is 
varied,  the  number  or  complexity  of  matched  filters 
makes  their  implementation  more  difficult.  For  example, 
an  aircraft  may  be  viewed  from  many  ranges  and  aspects, 
each  requiring  a  different  matched  filter.  In  this  paper 
matched  filters  are  designed  instead  to  detect  simple 
geometric  structures  which  are  suitable  for  characterising 
parts  of  targets  or  backgrounds.  For  example,  if  the 
target  is  a  missile,  we  may  choose  a  bar  shaped 


component;  this  will  not  fit  the  missile  exactly,  but  gives 
a  fairly  good  approximation  to  the  missile  body  and  the 
plume.  If  we  choose  very  simple,  generic  components, 
we  will  be  able  to  pick  up  a  wide  variety  of  objects,  but 
there  will  be  some  loss  in  detection  sensitivity  when  the 
SNR  is  very  low. 


2.2  Translation  and  Dilation  Invariance 

Convolution  involves  correlating  the  kernel  with  the  data 
and  repeating  this  operation  over  all  translations  of  the 
kernel.  Translation  is  usually  required  because  there  is 
little  or  no  knowledge  of  where  the  target  is  located  in 
space  or  time  prior  to  its  detection.  It  is  fortunate  that 
many  backgrounds  are  also  statistically  translation- 
invariant,  otherwise  translation  of  the  filter  kernel  would 
not  give  consistent  answers.  Non-stationary 
backgrounds,  such  as  in  forward-looking  imagery  are  a 
common  source  of  false  alarms  for  this  reason. 


In  many  situations  the  scale  of  the  target  signature  is  also 
not  known  a  priori,  primarily  because  it  is  affected  by 
the  distance  to  the  sensor.  Thus  multiresolution  analysis 
is  required,  where  the  filtering  is  repeated  over  varying 
scale  as  well  as  varying  position.  This  is  why  wavelet 
analysis  is  a  useful  tool  in  target  recognition.  Even  when 
there  is  prior  knowledge  of  target  size,  the  geometric 
features  used  in  its  analysis  may  be  of  varying  scale,  for 
example  a  missile’s  guidance  fins  are  usually  much 
smaller  than  its  fuselage. 


Many  backgrounds  are  statistically  scale-invariant, 
though  this  is  not  as  intuitively  obvious  as  translation- 
invariance.  In  many  images  of  familiar  scenes,  such  as 
natural  terrain,  the  scale  or  magnification  is  hard  to 
discern  unless  there  is  a  reference  object  of  known  size, 
such  as  an  adult  human.  This  phenomenon  is  the  subject 
of  fractal  geometry  [3,4],  where  scale-invariance  is 
known  as  self-similarity.  Strictly  speaking  self-similarity 
means  that  there  is  a  scaling  transformation  under  which 
the  signal  of  image  is  identical  to  a  subset  of  itself,  where 
the  scaling  transformation  is  given  by: 


/ 


fix) 


(3) 


where/is  the  signal  or  image  being  transformed,  5  is  the 
scale  or  dilation  factor  and  the  constant  h  is  sometimes 
called  the  self-similarity  parameter.  When  modelling 
natural  backgrounds  the  equality  of  equation  (3)  is 
replaced  by  statistical  invariance.  The  response  of 
statistically  self-similar  backgrounds  to  filters  at  different 
scales  can  be  normalised  by  dividing  by  sh  [5],  allowing 
the  same  threshold  to  result  in  the  same  false  alarm  rate 
over  a  range  of  scales.  We  will  return  to  the  use  of  fractal 
geometry  in  modelling  the  statistics  of  backgrounds  in 
Section  3. 
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2.3  The  Wavelet  Transform 

So  far  we  have  established  that  correlation  should  be 
combined  with  translation  and  dilation,  resulting  in 
wavelet  analysis,  and  that  the  background  is  often 
translation  and  scale  invariant,  which  means  that  the 
statistics  of  the  wavelet  transform  should  be  uniform. 
However,  care  must  be  taken  to  employ  the  correct  scale 
normalisation  factor  sh,  to  ensure  uniformity.  The 
standard  definition  of  the  continuous  wavelet  transform 
is  consistent  with  h--V 2,  the  self-similarity  parameter  for 
uncorrelated  backgrounds  (white  noise): 


T(p,s)  =  s  2  J/(x)g  -  dx 


(4) 


but  in  general,  for  constant  false  alarm  rate  (CFAR) 
detection,  the  formula  should  be  modified  to: 


T(p,s)  =  s-'-"  ]f(x)g 


x- p 


ax 


(5) 


measurements  to  local  maxima,  rather  than  at  all  points 
above  a  threshold,  prevents  redundant  features  being 
generated  from  the  same  object  in  its  neighbourhood.  At 
each  local  maximum,  its  location  in  wavelet  space 
(position  and  scale),  and  the  type  of  analysing  wavelet 
(e.g.  matched  to  a  blob  or  bar)  are  recorded,  as  well  as 
the  wavelet  transform  value  itself.  These  values  provide 
a  lot  of  information  about  targets  and  backgrounds,  for 
example  the  location,  scale,  orientation  and  grey-level 
contrast  of  edges  and  bars,  and  an  approximate 
reconstruction  is  available  by  linear  superposition  of  the 
features  whose  parameters  match  the  local  maxima.  In 
the  context  of  target  detection,  it  is  more  useful  to  retain 
only  those  features  whose  amplitude  (wavelet  transform 
value)  exceed  a  threshold  corresponding  to  a  prescribed 
probability  (Section  3).  These  features  provide  concise 
information  about  the  target  and  effective  clutter 
rejection,  from  which  a  partial  reconstruction  of  the 
target  can  be  obtained.  Examples  of  feature  extraction 
and  partial  reconstruction  are  given  in  Fig.  1, 
demonstrating  the  ability  to  represent  artefacts  such  as 
roads  and  buildings,  whilst  rejecting  most  clutter. 


The  shape  of  the  analysing  wavelet  g  can  be  derived  as 
the  matched  filter  of  the  geometric  feature  which  will  be 
used  to  characterise  the  signal  or  image  data,  at  a  chosen 
position  and  scale.  If  more  than  one  type  of  feature  is 
used,  for  example  radial  basis  functions  and  oriented 
bars,  then  more  than  one  analysing  wavelet  is  required, 
and  more  than  one  wavelet  transform  calculated. 

The  use  of  matched  filter  theory  to  derive  appropriate 
analysing  wavelets  shows  that  the  latter  depend  on  both 
the  shape  of  the  geometric  features  used  to  characterise 
the  data,  and  on  the  properties  of  the  background,  in 
particular  its  PSD.  For  uncorrelated  backgrounds  the 
matched  filter  is  the  same  as  the  geometric  feature, 
except  for  a  reversal  in  space  and  time  (a  mirror  image). 
For  most  correlated  backgrounds  the  PSD  decreases  with 
frequency,  so  the  matched  filter  is  similar  to  the  feature, 
except  higher  frequencies  are  emphasised,  which  has  the 
effect  of  introducing  side-lobes.  For  self-similar 
backgrounds  the  PSD  is  a  (usually  negative)  power  of 
frequency  [5],  and  the  matched  filter  is  thus  a  fractional 
derivative  [6]  of  the  feature.  For  example,  in  Brownian 
noise  the  matched  filter  is  the  time-reversed  second 
derivative  of  the  feature,  and  the  equivalent  operator  for 
rotationally-symmetric  image  backgrounds  is  the 
Laplacian  operator.  In  the  latter  case  the  matched  filter 
for  a  Gaussian  radial  basis  function  is  its  second 
derivative,  which  is  similar  to  the  Difference  of  Gaussian 
(DoG)  filter  so  popular  in  target  detection. 

2.4  Local  Maxima 

The  final  stage  in  feature  extraction  is  searching  for  local 
maxima  in  the  absolute  value  of  the  wavelet  transform, 
as  this  enables  the  data  to  be  decomposed  into  a  discrete 
set  of  objects,  and  reduces  redundancy.  Restricting 


The  method  of  searching  for  local  maxima  depends  on 
the  application.  A  search  restricted  to  the  positions  and 
scales  of  the  discrete  wavelet  transform  is  quick  to 
implement,  but  the  resulting  features  are  limited  by  the 
poor  resolution  of  the  pyramid  grid  of  positions  and 
scales.  This  type  of  feature  extraction  is  also  not  truly 
translation  and  scale  invariant,  because  most  shifts  in 
position  and  scale  cause  the  wavelet  grid  to  change.  This 
drawback  is  mitigated  if  a  wavelet  frame  is  used  with  a 
higher  resolution  in  position  and  scale,  which  is 
equivalent  to  interpolating  the  wavelet  transform.  Better 
still,  but  more  costly,  is  to  refine  the  positions  and  scales 
of  the  features  thus  found  by  evaluating  the  continuous 
wavelet  transform  explicitly  using  expanded  filters 
instead  of  downsampling  the  signal  or  image,  and  then 
optimising  this  function  using  a  local  search,  for  example 
a  conjugate  gradient  search  [7].  This  method  results  in 
true  translation  and  scale  invariance. 


3.  Background  Statistics  and  Fractal 
Geometry 

3.1  Threshold  Exceedance  Model 

Once  geometric  features  have  been  extracted  from  the 
data  (Section  2),  the  next  stage  is  to  discriminate 
between  potential  targets  and  the  background,  based  on 
the  statistics  of  the  latter.  First  we  describe  a  simple 
statistical  model  for  backgrounds  with  fractal  geometry 
which  predicts  the  rate  at  which  feature  amplitude 
(wavelet  transform  value)  exceeds  any  prescribed 
threshold  y  as  a  function  of  scale  s.  Jones  [8,9,10]  has 
shown  that  amplitude  threshold  exceedance  rates  are 
proportional  to  a  power  of  scale,  s'D,  where  D  is  a  fractal 
dimension  describing  the  self-similarity  of  feature 
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population,  which  is  different  to  h  (Section  2),  describing 
the  self-similarity  of  feature  amplitude.  This  self¬ 
similarity  enables  a  single  scale-independent  distribution 
to  represent  the  joint  statistics  of  amplitude  and  scale: 

Nys°  =  F(y)  (6) 

where  Ny  is  the  threshold  exceedance  rate  (number  of 
features  whose  amplitude  exceeds  the  threshold  y),  and  F 
is  a  scale-independent  function  defining  the  shape  of  the 
distribution. 

The  threshold  exceedance  statistics  in  [8]  and  for  the 
data  analysed  in  this  paper  are  approximated  well  by  the 
generalised  exponential  function  F(y)  =  yct p  where  c  and 
P  are  constants.  This  leads  to  the  following  model  for 
threshold  exceedance  statistics: 
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where: 

•  Ny  is  the  number  of  local  extrema  at  scale  s 
whose  amplitude  T(p,s)  exceeds  the  threshold 
value  y.  Ny  as  a  function  of  y  is  closely  related 
to  the  cumulative  probability  distribution  of 
wavelet  amplitude  T  at  the  local  extrema. 

•  D  is  a  fractal  dimension,  and  represents  the 
scaling  behaviour  of  the  population  density  of 
local  extrema  with  respect  to  position.  For 
statistically  stationary  signals  the  fractal 
dimension  is  equal  to  the  topological 
dimension,  that  is  1  for  signals,  2  for  images 
and  3  for  image  sequences  or  tomography.  For 
intermittent  self-similar  data  the  fractal 
dimension  can  be  less  than  the  topological 
dimension  because  of  the  hierarchical  clustering 
of  local  extrema,  where  at  any  scale  a  cluster  is 
formed  of  several  smaller  clusters  with 
intervening  gaps.  The  Cantor  set  is  an  example 
of  a  ID  signal  with  a  fractal  dimension  of  less 
than  unity. 

•  a:  is  called  lacunarity,  which  is  a  measure  of 
overall  population  density  of  local  extrema  in  a 
fractal. 

•  P  is  a  measure  of  overall  intensity,  which  in  a 
stationary  process  would  be  proportional  to  the 
standard  deviation  of  the  signal  or  image. 

•  c  is  an  inverse  measure  of  the  strength  of  the 
tails  in  the  distribution  of  wavelet  amplitude. 
For  example  c- 1  represents  an  exponential 
distribution  with  moderately  strong  tails, 


whereas  c= 2  represents  a  Gaussian  distribution 
with  weak  tails. 

Equation  (7)  can  be  used  to  assign  an  a  priori  probability 
that  a  geometric  feature  arises  from  the  background,  and 
by  setting  a  threshold  on  this  probability,  driven  by  the 
highest  acceptable  false  alarm  rate,  the  detection  of 
targets  as  unusual  events  is  possible.  For  backgrounds 
with  translation  and  rotation  invariance,  we  thus  have  a 
simple  method  of  target  detection  with  no  bias  in 
position,  scale  or  orientation. 

3.2  Calibration  of  the  Model 

To  use  equation  (7)  we  need  to  know  the  values  of  the 
parameters  D,  a,  p,  c  and  the  self-similarity  parameter  h 
(Section  2).  In  some  cases  there  is  prior  knowledge  of 
some  or  all  of  these  parameters,  for  example  there  are 
theoretical  reasons  for  assuming  a  self-similarity 
parameter  of  h=  1/3  for  fully-developed  turbulent  flow 
[11].  More  often  it  is  better  to  estimate  these  parameters 
from  the  signal  or  image  data.  This  can  be  done  off-line 
as  a  separate  training  process,  with  the  advantage  that 
there  will  probably  be  independent  assurance  that  targets 
are  missing  from  the  data.  On-line  calibration  has  the 
advantage  that  the  resulting  detection  algorithm  can 
adapt  to  changing  background  conditions,  for  example 
caused  by  changes  to  the  weather. 

We  now  describe  a  simple  and  efficient  method  of 
estimating  D,  a,  P,  c  and  h,  making  frequent  use  of  linear 
least-squares  regression.  The  data  used  in  this  statistical 
estimation  is  assumed  to  be  a  set  of  feature  amplitudes 
and  scales  (yk,  sk).  First,  an  approximate  value  of  self¬ 
similarity  hQ  is  used  to  calculate  the  modified  wavelet 
transform  of  equation  (5).  This  can  be  estimated  with 
reasonable  accuracy  from  the  slope  of  the  PSD  plotted  in 
log-log  co-ordinates;  in  theory  the  PSD  should  be 
proportional  to  s(lh+l)  [5],  The  self-similarity  parameter 
will  be  refined  based  on  estimation  from  feature 
amplitudes  and  scales.  The  initial  guess  only  weakly 
affects  the  location  and  density  of  local  extrema  in  the 
wavelet  transform,  so  high  accuracy  is  not  required. 

Next,  threshold  exceedance  counts  Ny  are  derived, 
conditioned  by  scale,  for  a  range  of  scales  s  and 
amplitudes  y.  The  conditioning  of  scale  depends  on  the 
method  of  searching  for  local  extrema  in  the  wavelet 
transform.  If  the  search  is  limited  to  values  of  scale  on  a 
discrete  grid,  then  these  values  are  used  to  partition  the 
data,  and  threshold  exceedance  counts  are  computed  for 
each  partition  set.  On  the  other  hand,  if  scales  are 
refined,  for  example  using  a  gradient  search,  it  is 
necessary  to  partition  scales  into  a  set  of  intervals. 
Threshold  exceedance  counts  Ny  are  derived  for  a  range 
of  values  y  for  each  scale  or  scale  interval.  An  efficient 
way  of  doing  this  for  every  value  of  y  in  the  geometric 
feature  set  is  to  sort  the  amplitudes  of  the  latter  into 
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descending  order;  the  value  Ny  for  each  y  is  then  its  index 
in  the  sorted  array. 

The  next  step  is  to  assign  a  weight  to  each  pair  of  (y,  Ny) 
which  reflects  its  relative  importance  when  fitting  the 
model  given  by  equation  (7).  The  weights  depend  on  the 
confidence  in  the  data  at  the  tails  of  the  distribution 
where  y  is  large  and  Ny  is  small.  Where  we  are  sure  that 
there  are  no  targets,  i.e.  the  background  is  “pure”,  it  is 
appropriate  to  assign  relatively  large  weights  at  the  tails, 
as  it  is  the  latter  which  are  important  in  discriminating 
targets  from  clutter.  In  this  case  assigning  weights  which 
have  uniform  density  in  amplitude  y,  or  even  density 
which  increases  with  y,  is  appropriate,  which  can  be  done 
by  setting  each  weight  to  be  the  difference  between  y  and 
the  next  lowest  value  of  y  in  the  same  scale  partition. 
Alternatively,  where  there  may  be  targets  present,  for 
example  during  on-line  calibration,  it  is  better  to  give  the 
tails  low  weighting,  and  rely  on  the  model  in  equation  (7) 
to  extrapolate  these.  This  can  be  done  by  assigning  equal 
weights  to  each  pair  (y,  Ny),  as  the  number  of  target 
features  is  almost  always  orders  of  magnitude  lower  than 
the  number  of  background  features. 


4.  Clutter  Rejection  from  Joint  Statistics 

For  some  backgrounds,  for  example  where  clutter  is 
intermittent,  detection  sensitivity  can  be  increased  by 
taking  additional  measurements  into  account,  thus 
introducing  additional  criteria  for  discriminating  targets 
and  the  background.  In  this  section  we  describe  a  method 
of  doing  this  based  on  the  modelling  of  the  joint  statistics 
of  wavelet  amplitude  and  other  variables  describing  the 
background. 

4.1  Vector  Fields 

The  most  straightforward  example  of  additional 
measurements  is  the  analysis  of  vector  fields  rather  than 
scalar  data,  in  which  each  sample  pixel  has  a  vector 
value.  Examples  of  vector  fields  are  fluid  flow  velocity, 
electromagnetic  fields,  and  multispectral  imagery.  In  this 
case  the  joint  statistics  of  wavelet  amplitudes  of  each 
component  of  the  vector  (treated  as  a  scalar  field)  are 
modelled.  These  wavelet  amplitudes  are  the  components 
of  a  multiresolution  vector  field  which  is  a  vector 
analogue  of  the  wavelet  transform: 


The  parameters  D,  or,  /?,  c  and  h,  are  now  estimated  using 
linear  least-squares  regression  with  the  weights  described 
above.  First  we  need  to  rewrite  equation  (7)  to  allow  for 
the  difference  between  the  approximate  self-similarity 
parameter  h0  and  the  actual  self-similarity  h: 
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The  tail  strength  c  is  estimated  by  fitting  a  linear  model 
to  (log(y),  log(log(V},)))  and  measuring  the  slope.  Next  a 
linear  model  is  fitted  to  (-yc,  log(V}.))  for  each  scale 
partition,  as  follows: 


log  {Ny)=Ax-Bsyc  (9) 


where  the  slopes  and  intercepts  A,v  and  Bs  depend  on 
scale  s.  Then  linear  models  are  fitted  to  the  following 
data:  (log(s),  As)  and  (log(s),  -log(£.y))  resulting  in: 


As  =Pt+P2  log(.v)  (10) 

and 

-log(flt)  =  P3  +  P4log(s')  (11) 


Substituting  (10)  and  (11)  into  (9)  we  obtain: 
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Comparing  equations  (8)  and  (12),  the  parameters  D,  or, 
p,  and  h,  are  derived  straightforwardly  from  the  slopes 
and  intercepts  P\,P%,  P3  and  P4. 


T  (P,S): 
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where  correlation  now  involves  pointwise  multiplication 
of  the  vector  field  f  by  the  scalar  filter  g  and  integration 
is  vector  summation.  Matched  filter  theory  can  still  be 
used  to  design  analysing  wavelets,  but  now  the  power 
spectrum  contains  information  about  the  cross 
correlation  of  the  vector  components  at  each  frequency, 
as  well  as  their  autocorrelation. 


There  is  no  direct  analogue  of  the  feature  extraction  of 
Section  2  for  vector  fields,  because  local  maxima  of  the 
wavelet  transform  are  not  defined,  being  vector  valued. 
If  the  probability  density  function  (PDF)  of  the  wavelet 
transform  were  known  a  priori  and  easy  to  compute,  then 
we  could  derive  a  scalar-valued  function  of  the  wavelet 
space  by  composing  the  above  vector  wavelet  transform 
(which  maps  wavelets  to  a  vector  space)  with  the  PDF 
(which  maps  the  same  vector  space  to  the  real  line).  It 
makes  sense  to  search  for  local  minima  in  the  composite 
function,  as  these  correspond  to  locally  unusual  events. 
There  are  two  drawbacks  to  this: 

•  Joint  PDFs  are  difficult  to  estimate  accurately,  with 
the  difficulty  increasing  rapidly  with  the  number  of 
vector  components.  Over-fitting  the  PDF  to  the 
training  data  is  a  common  problem  [12]. 

•  Not  all  outcomes  of  low  PDF  correspond  to  likely 
targets.  For  example,  when  based  on  local  maxima, 
most  distributions  of  wavelet  amplitude  have  low 
PDF  near  the  origin,  corresponding  to  very  low 
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feature  brightness  or  contrast.  These  features  are  not 
usually  target  related. 

In  view  of  the  above  problems,  the  usual  approach  is  to 
search  for  local  maxima  of  a  derived  scalar  quantity,  the 
Mahalanobis  distance  [13]: 

D(t)=(T-mJc'(T-ju )  (14) 

where  //  is  the  vector  mean  of  the  wavelet  transform 
values  T,  and  C  is  the  multivariate  covariance  matrix  of 
T.  For  Gaussian  distributions  D  is  proportional  to  the 
negative  logarithm  of  the  PDF,  and  so  is  an  unbiased 
measure  of  rarity.  This  measure  is  also  only  high  for 
unusually  high  wavelet  amplitudes  (relative  to  the  mean) 
at  the  external  boundary  of  the  distribution,  so  ignores 
very  faint  features.  Like  principal  component  analysis, 
this  method  is  very  effective  at  extracting  features  with 
unusual  vector  direction  as  well  as  amplitude,  especially 
when  there  is  a  high  level  of  correlation  in  vector 
direction.  This  situation  is  common  in  multispectral 
imagery  when  the  spectral  bands  of  the  background  are 
close  in  radiation  frequency,  resulting  in  a  limited  range 
of  “colour”.  Objects  which  have  an  unusual  radiation 
spectrum  (colour)  often  generate  wavelet  amplitudes 
with  a  large  Mahalanobis  distance,  even  when  none  of 
the  vector  components  is  unusually  large  regarded  in 
isolation. 

The  example  in  Fig.  2  concerns  5-band  AVHRR 
(Advanced  Very  High  Resolution  Radiometer)  satellite 
imagery  of  sea  surface  background.  This  data  is  a  fusion 
of  two  visible  bands  (0.6  and  0.8  pm)  and  three  infrared 
bands  (3.7,  11  and  12  pm).  The  objects  of  interest  are 
tracks  which  arise  from  ship  plumes  altering  the  spectral 
properties  of  the  cloud.  These  tracks  are  visible  in  Band 
3  (3.7  pm),  and  are  generally  absent  or  very  faint  in  the 
other  bands.  Figs.  2a  and  2b  show  an  example  of  such  a 
track  against  cloud  clutter  in  Bands  1  and  3  respectively: 
the  local  SNR  of  the  track  is  greatest  in  Band  3,  but  not 
sufficient  to  be  easily  distinguished  from  the  cloud 
edges,  especially  towards  the  top  of  the  picture.  We  base 
our  analysis  on  these  two  bands  because  the  other  bands 
are  either  very  similar  in  frequency  or  contain  little  new 
information. 

Figs.  2c  and  2d  illustrate  the  advantage  of  basing  the 
Mahalanobis  distance  on  the  wavelet  transform  T  (Fig. 
2c)  rather  than  the  raw  pixel  values  (Fig.  2d);  the  ship 
track  stands  out  much  more  clearly  in  the  former  case. 

4.2  Intermittent  Backgrounds 

Another  example  of  the  use  of  additional  random 
variables  concerns  the  modelling  of  strongly  intermittent 
backgrounds,  in  which  some  regions  are  energetic 
(highly  cluttered)  and  others  are  relatively  quiescent. 
Statistical  translation-invariance  does  not  rule  out  this 


kind  of  variability  with  respect  to  position;  it  implies 
only  that  global  statistics  are  invariant.  Local  statistics, 
such  as  standard  deviation  over  a  neighbourhood  of  the 
signal  or  image,  may  still  be  position-dependent,  and 
often  are  for  natural  backgrounds.  Variation  in  local 
statistics  happens  either  because  the  image  background 
comprises  more  than  one  texture,  for  example  associated 
with  woodland,  mountains  and  sea,  or  because  a  single 
texture  type  is  spatially  intermittent,  in  which  some 
regions  are  more  energetic  than  others. 

Intermittency  is  represented  implicitly  in  the  statistical 
model  of  Section  3  by  the  fractal  dimension  D  and  the 
lacunarity  a,  but  no  account  is  taken  of  the  dependence 
of  the  average  amplitude  or  population  density  of  the 
geometric  features  on  their  position,  i.e.  their  tendency  to 
cluster.  Thus  whilst  the  effect  of  clustering  on  global 
population  densities  is  accounted  for,  the  variation  in 
local  density  with  position  is  not.  If  position  dependence 
is  not  modelled,  the  same  thresholds  will  be  applied  to 
energetic  and  quiescent  regions  alike,  resulting  in  either  a 
loss  of  sensitivity  in  the  latter  or  too  many  false  alarms  in 
the  former.  We  now  show  how  the  joint  statistics  of 
wavelet  amplitude  and  a  non-linear  function,  local 
energy,  is  used  to  rectify  the  above  limitation.  Further 
detail  is  given  in  [14]. 

The  most  obvious  method  of  allowing  for  spatial 
variations  in  background  activity  is  to  condition  the 
background  model  given  by  equation  (7)  with  respect  to 
position.  This  approach  would  require  statistical 
distributions  to  be  measured  over  neighbourhoods  of  the 
signal  or  image  and  then  the  model  parameters  fitted  in 
each  case  separately.  This  approach  is  not  effective 
because  the  fractal  model  parameter  estimation  only 
works  well  with  large  images  (at  least  64x64  and  usually 
much  larger)  as  filter  statistics  need  to  be  compiled  over 
a  number  of  scales  and  in  sufficiently  large  numbers  to 
estimate  a  cumulative  distribution  function.  The  fractal 
model  has  the  advantage  of  realism  when  applied  to  large 
images  but  at  the  cost  of  a  rapid  decrease  in  accuracy  for 
images  much  below  128x128  pixels,  and  is  thus 
unsuitable  for  the  representation  of  local  statistics. 

In  any  model  of  the  background  which  is  conditioned  by 
position,  there  is  an  inevitable  conflict  between  the 
accuracy  of  the  estimation  of  statistical  parameters  on  the 
one  hand,  and  spatial  resolution  of  background  intensity 
levels  on  the  other.  For  the  purposes  of  target  detection, 
good  spatial  resolution  is  important  because  the  threat, 
for  example  an  incoming  missile,  may  be  very  close  to 
clutter,  such  as  cloud  edges,  which  could  lead  to 
incorrect  thresholds  being  used  by  the  detection  process. 
The  advantages  of  measuring  background  activity  levels 
with  spatial  resolutions  of  less  than  64  pixels  in  such 
cases  are  obvious.  It  follows  that  in  the  choice  of  the 
representation  of  local  statistics,  spatial  resolution  should 
be  given  a  high  priority.  To  achieve  high  spatial 
resolution,  local  background  activity  needs  to  be 
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represented  by  a  robust  statistic  which  requires  a  small 
amount  of  image  data  to  compute  reasonably  accurately. 
This  is  why  a  simple  non-linear  measure,  local  energy, 
has  been  chosen  for  the  spatial  conditioning  of  the 
background  statistics. 

Local  energy  is  a  measure  of  local  background  activity 
which  is  computed  over  neighbourhoods  of  varying  size 
surrounding  each  geometric  feature  extracted  from  the 
data  (Section  2).  Local  energy  E(x,  L)  is  defined  to  be 
the  variance  of  the  signal  or  image  over  a  region  centred 
on  x  and  of  size  L.  For  ID  signals  this  region  is  a  subset 
of  the  real  line,  for  example  an  interval  of  length  L,  for 
2D  images  it  is  a  subset  of  the  real  plane,  for  example  a 
square,  and  for  3D  image  sequences  a  subset  of  space- 
time.  Detection  sensitivity  is  generally  improved  if  the 
support  of  the  geometric  feature  is  subtracted  from  this 
region,  so  that  the  wavelet  amplitude  and  local  energy 
are  less  mutually  dependent. 

The  region  size  L  is  defined  relative  to  the  scale  s  of  each 
extracted  feature  and  is  not  to  be  confused  with  the  scale 
of  the  feature  itself.  Local  energy  measurements  over 
variable  size  L  are  required  to  achieve  the  best  balance 
between  spatial  resolution  and  accuracy  of  local  energy 
measurement.  In  cases  where  a  potential  target  is  close  to 
but  not  within  an  energetic  region,  a  small  local  energy 
region  gives  the  best  probability  of  detection.  In  cases 
where  the  background  activity  is  more  uniform,  however, 
fainter  targets  can  be  detected  with  greater  sensitivity 
where  background  levels  have  been  measured  over  an 
extended  region.  The  following  region  sizes  L  have  been 
considered  in  this  paper:  2,  4,  8,  16  and  32.  We  now 
describe  how  the  joint  statistics  of  local  energy  at  each 
size  L  and  wavelet  amplitude  T  are  used  to  model 
spatially  intermittent  backgrounds  and  improve  the 
discrimination  between  targets  and  clutter,  illustrating 
the  technique  on  infrared  imagery  of  the  celestial 
background  (Fig.  3)  in  which  some  simulated  point 
targets  have  been  inserted. 

Fig.  4  shows  the  joint  statistics  of  T  and  E  in  the  form  of 
a  scatter  plot,  where  L  is  set  to  8,  background  data  are 
presented  by  points,  and  the  targets  by  crosses.  The 
scatter  plot  resembles  the  joint  PDF  of  these  random 
variables,  as  regions  of  high  population  density  in  the 
scatter  plot  correspond  to  regions  of  high  PDF.  The 
embedded  targets  are  mostly  separate  from  the 
background  distribution,  that  is  in  a  region  where  the 
background  joint  PDF  is  low.  Thus  in  this  case  most  of 
the  targets  can  be  distinguished  from  the  background  as 
statistical  outliers,  but  they  are  not  outliers  with  respect 
to  wavelet  amplitude  alone.  Some  of  the  target  crosses 
are  located  within  regions  of  high  background  density, 
but  these  all  have  high  local  energy  and  thus  correspond 
to  targets  embedded  in  strong  clutter,  in  this  case  IR 
cirrus,  where  the  targets  are  very  difficult  to  identify  by 
visual  inspection.  When  in  regions  of  low  local  energy, 


the  targets  are  bright  relative  to  the  local  background  and 
are  thus  perceived  more  easily. 

The  modelling  of  the  joint  distribution  of  y  and  E  is  done 
along  similar  lines  to  the  joint  distribution  of  T  and  s  in 
Section  3,  to  avoid  the  disadvantages  of  estimating  the 
joint  PDF  of  more  than  one  random  variable  mentioned 
in  Section  4.1.  Once  the  self-similarity  parameter  h  has 
been  estimated,  y  is  independent  of  scale,  so  the 
dependence  on  s  can  be  ignored.  Details  of  the  modelling 
of  T  and  E  are  given  in  [14],  an  outline  of  which  is  given 
here.  The  idea  is  to  predict  the  conditional  cumulative 
probability  distribution  of  T  given  E.  Observation  of  real 
and  simulated  intermittent  data  (e.g.  Fig.  3  for  the  IR 
celestial  background)  has  shown  that  this  conditional 
distribution  is  self-similar,  being  only  dependent  on  an 
energy-dependent  dilation  factor,  which  can  be  estimated 
by  calculating  the  conditional  standard  deviation  of  T  as 
a  function  of  E.  It  has  been  found  that  a  power-law  fits 
this  function  quite  well,  so  the  following  model  of 
conditional  threshold  exceedance  probability  has  been 
used: 


f 

(  Y 

Pr(y’>  y  1  E’=  E)  =  exp 

y 

{**' ) 

\ 


(15) 


The  parameter  r  is  another  measure  of  intermittency  of 
the  background,  and  can  be  estimated  by  linear  least 
squares  regression  of  the  logarithm  of  conditional 
standard  deviation  versus  the  logarithm  of  local  energy. 
The  following  energy-normalised  wavelet  amplitude 


/  =  TE~r  (16) 

is  an  energy-independent  measure  of  rarity,  and  targets 
and  clutter  can  be  discriminated  more  effectively  based 
on  /  instead  of  the  un-normalised  y.  Fig  5  shows  the  joint 
statistics  of  /  and  E,  where  it  can  be  seen  that  the 
dependence  on  E  is  greatly  diminished  compared  with  T 
(Fig.  4),  and  that  a  single  threshold  on  I  is  sufficient  to 
detect  most  targets  with  very  few  false  alarms. 

The  last  step  is  to  combine  the  rarity  measures  7(1)  over 
multiple  regions  sizes  L.  The  idea  is  to  select  the  "best" 
region  size  L  given  the  local  distribution  of  clutter.  If  the 
object  in  question  is  completely  embedded  in  relatively 
strong  clutter  then  detection  will  be  difficult  for  any  size 
L.  If  the  object  is  close  to  but  not  embedded  in  strong 
clutter  then  a  small  L  will  give  the  best  detection 
sensitivity,  but  a  larger  value  is  better  otherwise.  The 
"best"  size  corresponds  to  the  largest  region  not 
overlapping  strong  clutter,  in  which  local  statistics  are 
most  accurately  represented. 

An  overall  measure  of  rarity  is  therefore  defined  as 
follows.  First  7(L)  is  normalised  by  dividing  by  its 
standard  deviation  over  all  wavelets.  Then  for  each 
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feature  the  largest  region  size  L  is  selected  for  which  the 
function  I\L )  given  by: 

/'(L„)  =  /(4W(4-,)>  n>2  (17) 

is  less  than  3.5  for  all  lower  scales: 

Lmax  =  ma x{L„ : l\Lk ) < 3.5o Vk:2<k<n}  (18) 

Sudden  decreases  in  7(L)  are  considered  because  these 
correspond  to  a  step  increase  in  local  energy,  which 
occurs  when  a  region  of  clutter  is  encountered,  as 
required.  Decreases  of  less  than  3.5  standard  deviations 
are  ignored  on  the  grounds  that  they  are  not  strong 
enough  to  correspond  to  large  changes  in  background 
statistics.  An  overall  measure  of  unusualness  is  then 
defined  as  the  mean  value  of  I  of  size  not  exceeding 

■^max’ 

',=  £'<«/ X  i-  (,9) 

L<Lm.„,  /  L<Lm„ 

Maximum  and  minimum  values  of  /(L)  were  also 
investigated  but  were  found  to  be  less  robust 
discriminators  between  targets  and  the  background. 

Figs.  6a  and  6b  compare  the  target  detection 
performance  of  the  original  scale-normalised  wavelet 
amplitude  T  and  the  new  energy-normalised  measure  /, 
where  in  both  cases  the  threshold  has  been  set  at  2 
standard  deviations  (of  T  or  /).  The  use  of  T  results  in  21 
targets  being  detected  along  with  19  false  alarms, 
compared  with  29  targets  and  only  1  false  alarm  when  I 
is  used.  The  improvement  in  performance  is  due  to  the 
ability  to  take  local  background  conditions  into  account 
in  the  neighbourhood  of  each  target. 

4.3  Image  Sequence  Analysis:  Track  Before  Detect 

The  final  example  of  the  use  of  additional  random 
variables  is  to  aid  the  detection  of  targets  in  time- 
sequenced  imagery,  where  target  motion  and  the 
evolution  of  clutter  in  time  need  to  be  taken  into  account. 
Many  target  detection  methods,  including  most  earlier 
ones  [15],  are  based  on  “track  after  detect”,  in  which 
initial  processing  is  limited  to  single  frames,  and  track 
association  algorithms  are  employed  subsequently  to 
characterise  movement.  These  methods  are  not  effective 
when  there  is  insufficient  information  in  single  frames  to 
discriminate  targets  from  a  large  number  of  clutter 
objects,  typically  when  the  target  is  amongst  clutter  of 
equal  of  greater  edge  contrast,  or  of  similar  texture,  such 
as  roads,  buildings,  cloud  edges  and  sea  glint.  Human 
vision  has  the  same  limitation;  objects  which  cannot  be 
recognised  in  still  frames  can  often  be  detected 
subliminally  in  a  movie  by  virtue  of  movement  or  other 
distinguishing  temporal  behaviour. 


We  consider  instead  a  “track  before  detect”  (TBD) 
method,  which  combines  image  data  from  different 
frames  prior  to  detection,  treating  the  data  as  a  single  3- 
dimensional  set,  and  applying  3D  filters.  In  this  context, 
movement  is  inherent  in  the  3D  orientation  of  geometric 
features  within  the  data,  and  directly  affects  the  response 
of  3D  filters.  MFT  is  applicable  to  the  analysis  of  3D 
image  sequence  data  [16,17],  using  the  same  underlying 
principles  as  for  ID  signals  and  2D  images.  The  PSD  is  a 
3D  function,  providing  information  about  both  spatial 
and  temporal  correlation  of  the  background. 

In  the  case  of  point  targets  against  a  bland  background, 
where  uncorrelated  sensor  noise  dominates,  MFT  results 
in  a  classic  velocity-tuned  filter  which  integrates  image 
intensity  along  the  trajectory  of  the  target.  A  very 
different  filter  is  required  for  scenes  with  strong  static 
clutter  but  little  sensor  noise.  The  background  is  usually 
correlated  in  space  (e.g.  with  a  l/f  PSD)  and  very 
strongly  correlated  in  time  (nearly  constant),  so  MFT 
leads  to  mean  subtraction  over  time  and  a  decorrelation 
filter  in  space,  e.g.  a  (fractional)  Laplacian  filter  [16], 
Additional  spatial  smoothing  is  required  for  larger 
targets.  For  both  types  of  background,  however,  the 
matched  filter  is  strongly  dependent  on  velocity  for  small 
targets,  so  often  a  large  number  of  filters  are  required  for 
low  targets  with  low  SNR. 

We  have  been  able  to  take  advantages  of  symmetry  in 
static  downward-looking  imagery  to  simplify  the 
background  statistical  models,  using  invariance  with 
respect  to  translation,  rotation  and  scale.  Unfortunately, 
the  statistics  of  space-time  structure  in  image  sequences 
are  often  much  more  complex,  with  less  natural 
symmetry.  Whereas  the  two  spatial  dimensions  often 
have  an  equal  footing  and  similar  statistics,  the  time 
dimension  is  generally  different,  for  example  a  scene 
may  be  very  highly  correlated  in  time  but  much  less  so  in 
space.  The  space-time  equivalent  of  rotation  is 
(approximately)  a  Galileian  transformation,  i.e.  a 
velocity  shift.  The  statistics  of  space-time  structure  are 
usually  strongly  dependent  on  velocity  shifts,  unlike 
spatial  rotation.  The  effect  of  time  dilation  is  usually 
quite  different  from  spatial  dilation,  and  both  usually 
affects  the  statistics  of  movement  dramatically,  as  can  be 
seen  when  a  movie  is  played  at  the  wrong  speed.  In 
forward-looking  imagery  the  situation  is  even  worse,  as 
there  is  no  longer  invariance  with  respect  to  the  vertical 
image  co-ordinate,  or  to  spatial  orientation. 

The  complexity  of  space-time  image  structure  can  be 
seen  with  forward-looking  imagery  of  sea  glint  (Fig.  7). 
A  bank  of  simple  3x3x3  space-time  bar  filters  has  been 
applied  to  pick  out  bars  with  13  different  orientations. 
Their  statistics  are  very  complex  for  sea  glint,  as  shown 
by  Fig.  8,  where  the  thresholds  for  an  exceedance 
probability  of  10%  are  compared.  These  filters  have 
been  normalised  to  give  identical  exceedance  rates  in 
Gaussian  white  noise.  With  this  forward-looking  imagery 
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much  of  the  spatial  symmetry  is  lost  also,  as  the  clutter 
strength  is  strongly  dependent  on  the  vertical  image  co¬ 
ordinate  y  and  also  on  spatial  orientation,  with  stronger 
returns  from  horizontal  bars  and  edges  than  vertical  ones. 

The  wavelet  transform  has  fewer  advantages  when 
applied  to  time-sequence  imagery  than  to  static  images 
because  of  the  above  lack  of  symmetry.  The 
conventional  3D  wavelet  transform  applies  a  single 
dilation  operator,  whereas  space  and  time  should  be 
scaled  independently.  It  is  possible,  however,  to  employ 
a  non-homogenous  form  of  3D  wavelet  analysis  [18]  to 
scale  space  and  time  independently. 

The  approach  of  using  a  bank  of  matched  filters  involves 
looking  for  unusually  strong  responses  from  each  filter, 
regardless  of  the  responses  from  the  other  filters,  and  so 
requires  knowledge  only  of  the  univariate  statistics  of 
single  filter  *  responses.  Instead,  we  consider  the 
multivariate  statistics  of  a  small  number  of  filters  which 
individually  are  not  as  good  discriminators  of  targets  and 
cutter,  but  when  considered  jointly  can  be  more  effective 
at  this  task.  As  with  static  imagery,  we  consider  a  small 
number  of  simple  geometric  features,  oriented  bars  in 
space-time,  which  in  this  case  represent  moving  blobs, 
where  3D  orientation  corresponds  to  the  velocity  of 
motion.  The  aspect  ratio  of  these  bars  (ratio  of  length  to 
width)  is  small,  about  3,  so  the  resolution  in  velocity  is 
low. 

We  now  consider  what  additional  filtering  will  aid  clutter 
rejection,  in  the  same  way  that  local  energy  was  used  for 
intermittent  static  backgrounds  (Section  4.2). 
Conventional  target  detection  methods  [19]  perform 
poorly  on  sea  glint  compared  to  human  vision,  because 
the  target  detection  filters  also  respond  well  to  sea  glint, 
not  recognising  its  transience.  It  is  easier  to  recognise  the 
target  because  it  doesn’t  respond  well  to  a  transient 
detection  filter,  than  because  it  triggers  a  filter  tuned  to 
the  target.  This  leads  us  to  consider  the  joint  statistics  of 
3D  bar  filters,  weakly  tuned  to  velocity,  and  simple  non¬ 
linear  filters  measuring  the  variability  in  brightness  along 
the  length  of  each  bar: 


between  the  target  and  clutter  is  possible.  In  some 
situations  only  targets  coming  head-on,  with  no  apparent 
motion,  will  be  of  interest.  Where  lateral  motion  is  also 
expected,  the  above  joint  statistics  could  be  studied 
independently  for  filter  pairs,  each  tuned  to  a  different 
velocity.  However,  with  the  extra  discrimination 
provided  by  the  variability  filter,  only  a  low  resolution  in 
velocity  is  required,  and  hence  a  small  number  of  filter 
pairs. 

Additional  discrimination  is  available  by  adding  a  third 
measurement  to  the  joint  statistics,  the  vertical  image  co¬ 
ordinate  Y,  to  take  account  of  the  reduction  in  sea  glint 
strength  nearer  the  horizon,  and  also  the  very  different 
space-time  structure  above  the  horizon  (Fig.  9b).  The 
joint  statistics  of  the  bar  filter,  variability  and  Y  are  not 
well  suited  to  a  parametric  model  like  that  involving 
local  energy  (Section  4.2),  partly  because  there  are 
arbitrary  discontinuities,  for  example  at  the  horizon. 
Instead  a  non-parametric  method  of  conditional  threshold 
exceedance  probability  estimation  is  used,  described 
briefly  next.  Details  are  available  in  [19]. 

The  idea  is  to  estimate  the  conditional  threshold 
exceedance  probability  of  a  single  target-related 
measurement  T  given  a  set  of  background-related 
measurements  B.  In  our  case  the  target  measurement  is 
the  bar  filter  output,  and  the  background  measurements 
are  V  and  Y.  This  method  is  . also  applicable  to  Section 
4.2,  where  the  background  measurement  is  local  energy 
E.  The  conditional  probability  can  be  written: 

Pr(T>Tfl  \B  =  B0)  (21) 

We  consider  conditional  rather  than  joint  probability 
because  the  marginal  probability  density  of  the 
background  measurements  are  assumed  not  to  be  of 
interest;  for  example  if  V  is  unusually  large,  this  does  not 
imply  that  a  target  is  very  likely,  only  that  a  transient 
glint  of  unusual  strength  occurs.  If,  on  the  other  hand,  the 
bar  filter  is  unusually  strong  given  a  measured  level  of 
glint,  a  target  is  more  likely.  The  values  of  B  are  of  no 
significance,  other  than  in  conditioning  the  value  of  T. 


V  =  max 

k 


(20) 


where  x*  are  the  pixel  values  along  the  central  axis  of  the 
3D  bar.  This  filter  will  give  a  strong  response  to 
intermittent  sea  glint  (flashes)  but  usually  a  much  weaker 
response  to  a  moving  target. 


The  joint  statistics  of  a  pair  of  these  filters  are  shown  in 
Fig.  9a  for  sea  glint,  where  an  incoming  point  target  has 
been  simulated  which  is  significantly  fainter  than  the 
glint.  The  target  has  been  marked  in  Fig.  9a  with  a  cross, 
and  lies  well  outside  the  background  distribution,  but 
like  local  energy  (Section  4.2)  it  is  only  by  considering 
the  joint  statistics  of  these  filters  that  discrimination 


Estimating  the  probability  given  by  equation  (21) 
requires  some  sort  of  local  averaging  in  B,  using  a 
window,  assuming  the  latter  has  a  continuous 
distribution.  The  following  is  a  simple  estimator  of  this 
probability: 


2>xpMIb*-b0|) 

n>r0  _  (22) 

5>xp(-A||B,  -B0f) 
k 

where  the  exponential  function  is  a  Parzen  window 
which  gives  greater  weighting  to  points  which  are  close 
to  B()  than  those  further  away,  and  A  is  the  reciprocal  of 
the  radius  of  the  window.  As  with  probability  density 
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estimation,  when  selecting  A  there  is  a  trade-off  between 
high  resolution  ( A  is  small)  and  statistical  accuracy  (A  is 
large).  We  adapt  A  according  to  the  density  of  values  B 
in  the  neighbourhood  of  B0,  so  that  a  large  radius  is  used 
in  sparse  regions  and  a  small  radius  in  dense  regions. 
This  is  achieved  by  choosing  A  such  that  the 
denominator  of  equation  (22)  is  a  prescribed  value  which 
can  be  thought  of  as  the  number  of  measurements  being 
taken  into  account  when  estimating  the  conditional 
probability. 

Equation  (22)  is  an  unbiased  estimator  of  equation  (21) 
only  if  the  true  density  is  independent  of  B  within  the 
region  of  influence  of  the  Parzen  window.  Where  there  is 
global  independence,  there  is  no  need  for  a  window 
function;  estimates  of  unconditional  exceedance 
probability  will  suffice.  If  there  is  dependence  on  B,  then 
at  most  points  the  gradient  of  exceedance  probability  will 
be  non-zero,  and  thus  linear  dependence  will  be  a  better 
model  than  a  constant  value  (local  independence).  This 
can  be  seen  clearly  in  the  example  of  Fig.  9a,  where 
contours  of  constant  density  are  not  parallel  to  the 
vertical  axis,  but  appear  to  be  smooth,  implying  a  linear 
model  would  be  a  major  improvement  over  a  constant 
model. 

A  linear  model  is  defined  using  the  following  formula: 

]>>(T  -  T0  +  L(Bk  -  B0  ))exp(-  A|B,  -  fl0|)  ^ 

Xexp(-A||B,  -fl0||) 

k 

where  L  is  a  linear  function  of  the  random  variables  in  B 
and  %  is  the  function  which  returns  0  if  the  input  is  less 
than  0  and  1  otherwise.  The  coefficients  of  the  function  L 
are  estimated  from  the  data  using  linear  least  squares 
regression  but  where  the  data  values  are  weighted 
according  to  the  similarity  of  B  to  B{)  and  the  magnitude 
of  T,  where  there  is  a  bias  towards  large  values  of  T,  so 
that  attention  is  given  to  the  tails  of  the  distribution. 
Once  L  is  estimated,  equation  (23)  is  used  to  estimate  the 
conditional  threshold  exceedance  probability. 

Figs.  10a  and  10b  show  the  10  events  with  the  lowest 
threshold  exceedance  probability,  estimated  using  the 
above  method,  both  on  the  scatter  plot  of  T  against  V, 
and  their  corresponding  locations  within  the  image.  The 
sea  target  is  the  2nd  most  unusual  event,  with  an 
extremely  low  estimated  exceedance  probability  of 
1.7xl0'29.  Without  the  conditioning  with  respect  to  the 
background  variables,  the  target  exceedance  probability 
would  have  been  much  higher,  with  the  sea  glint 
dominating  the  results.  From  a  visual  inspection  of  the 
scatter  plots  (Figs  9a  and  9b),  the  target  appears  to  be  the 
least  probable  event,  but  the  point  with  greatest  rod  filter 
output  has  been  assigned  an  exceedance  probability  of 
zero.  This  is  a  technical  failing  of  the  estimation 
algorithm;  a  hard  decision  boundary  is  used  in  the 


numerator  of  equation  (23),  so  whilst  uncertainty  in  the 
background  random  variables  Y  and  V  is  taken  into 
account  in  the  Parzen  window,  there  is  as  yet  no 
analogous  uncertainty  in  the  target  variable  T.  If  there 
happen  to  be  no  sample  values  to  the  right  of  the  linear 
decision  boundary,  the  current  estimate  of  exceedance 
probability  is  therefore  zero.  This  limitation  could  be 
rectified  by  using  a  soft  decision  boundary,  for  example 
using  the  logistic  function  instead  of  %. 

As  with  many  detection  methods,  temporal  consistency 
can  be  used  as  an  additional  criterion  to  improve 
performance  further.  Although  in  this  example  the  target 
has  been  detected  successfully  from  just  a  single 
snapshot  of  5  frames,  if  the  SCR  was  much  less,  the 
exceedance  probabilities  of  the  most  intense  sea  glint 
could  be  as  low  as  that  of  the  target.  When  the  above 
analysis  is  repeated  on  later  groups  of  frames,  the  target 
continues  to  have  a  very  low  exceedance  probability, 
whereas  the  locations  of  the  other  low-probability  events 
within  the  sea  change.  Fig.  11  shows  the  top  40 
detections  superimposed  from  each  of  four  successive 
time  intervals,  in  which  it  can  be  seen  that  the  only 
persistent  events  are  the  target,  a  few  points  on  the  land 
(which  are  unchanging),  plus  3  points  within  the  sky 
(which  are  known  to  be  dead  pixels  and  therefore 
constant).  Out  of  these  persistent  events,  the  estimated 
target  probability  (ranging  between  10'8  and  10"29)  is 
much  lower  than  that  of  the  other  events  (typically  103  to 
10'5). 
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Fig.  2:  Detection  of  ship  track  in  AVHRR  imagery  using  vector  wavelets 


(c)  Detections  using  wavelet-based  (d)  Detections  using  pixel-based 

Mahalanobis  distance  Mahalanobis  distance 
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Fig.  3:  IRAS  image  of  celestial  background  imagery 
with  simulated  point  targets  embedded 


Wavelet  Intensity  Normalised  filter  output,  T/E°.42 

Fig.  4:  Joint  distribution  of  wavelet  Fig.  5:  Joint  distribution  of  energy-normalised 

intensity  (T)  and  local  energy  (E)  wavelet  intensity  (T/E®-^)  and  local  energy  (E) 


Fig.  6a:  Detections  at  a  threshold  of  2 o  Fig.  6b:  Detections  at  a  threshold  of  2o 

based  on  wavelet  intensity  (T)  based  on  energy-normalised  wavelet  intensity  (I) 


Squares  =  embedded  targets,  diamonds  =  false  alarms 


Fig.  7:  Sea  glint  data 

Single-pixel  slowly-moving  target  (arrowed)  artificially  embedded  in  sea  clutter 


250 


Oriented  bar  filters 


Fig.  8:  Statistics  of  responses  of  simple  3x3x3  space-time  bar  filters 
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Fig.  9a:  Joint  statistics  of  rod  filter  output  (T)  and  variability  measure  (V) 


Rod  filter  output 


Fig.  9b:  Joint  statistics  of  rod  filter  output  (T)  and  vertical  position  (Y) 
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Rod  filter  output  „  m4 

Fig  10a:  The  10  most  unusual  events  as  a  result  of  3-D  statistical  analysis, 
shown  within  the  joint  statistics  of  rod  filter  output  (T)  and  variability  measure  (V) 


Fig  10b:  Locations  of  the  10  most  unusual  events  from  3-D  statistical  analysis 


Fig.  1 1 :  Demonstration  of  lack  of  persistency  of  sea  glint 
Top  40  detections  within  4  successive  time  intervals  superimposed 
Detections  marked  with  plus,  cross,  circle  and  triangle  respectively 
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1.  Introduction 

This  paper  is  concerned  with  the  extraction  of  geometric 
information  from  signals  and  images  which  is  suitable 
for  pattern  recognition,  but  which  is  as  insensitive  as 
possible  to  the  conditions  of  data  collection,  in  particular 
sensor  noise,  background  clutter  and  changes  in  the 
geometric  relationship  between  the  sensor  and  the  scene 
(e.g.  3D  viewing  aspect).  We  describe  a  method  of 
feature  extraction  which  involves  extending  wavelet 
analysis  beyond  translation  and  dilation,  leading  to 
invariance  to  a  greater  variety  of  transformations,  as  well 
as  insensitivity  to  noise  and  clutter.  Feature  extraction 
involves  searching  for  local  maxima  in  a  generalisation 
of  the  continuous  wavelet  transform,  and  we  describe 
how  Riemannian  geometry  is  used  to  aid  this  process. 
Additional  topics  covered  are  signal  and  image 
reconstruction,  redundancy  removal  and  application  to 
3D  aspect-invariant  target  recognition. 

The  analysis  of  any  data,  including  signals  and  images, 
involves  the  extraction  of  application-specific 
information  and  the  rejection  of  other  less  relevant  data. 
Transforming  the  data  into  a  form  where  the  information 
of  interest  is  easier  to  obtain  often  facilitates  the 
analysis.  Clearly,  general-purpose  transformations  are 
more  useful  if  they  transform  the  data  into  a  form  of 
information  that  is  relevant  to  many  applications.  This  is 
the  case  with  the  Fourier  transform  because  many 
phenomena  of  interest  to  scientists  and  engineers  are 
more  simply  described  in  terms  of  sinusoids  than  the  raw 
data  samples.  The  wavelet  transform  [1,2]  is  another 
example,  which  has  become  increasingly  popular  in 
recent  years  because  of  its  ability  to  identify  both 
position  and  scale,  and  hence  the  role  it  plays  in  multi¬ 
resolution  signal  and  image  processing. 

The  motivation  behind  the  material  covered  in  this  paper 
is. the  desire  to  recognise  objects  of  interest,  but  to  ignore 
other  information  not  of  interest,  and  to  segregate  the 
two  as  efficiently  as  possible.  However,  as  with  the 
Fourier  and  wavelet  transforms,  we  wish  the  method  of 
analysis  to  be  as  generic  as  possible,  so  rather  than 
attempting  a  high-level  characterisation  of  the  sensor 
data,  for  example  the  classification  of  military  targets,  a 
lower-level  characterisation  in  terms  of  simple  geometric 
features  such  as  sine  waves  or  compact  wavelets  is 
sought.  The  aim  is  to  represent  application-specific  data 
as  combinations  of  these  lower-level  features. 


There  are  two  categories  of  data  which  we  wish  to 
ignore,  or  at  least  separate  from  the  object  of  interest: 

•  Noise  and  clutter.  We  aim  to  ignore  this  by  selecting 
a  subset  of  the  above  geometric  features,  related  to 
the  information  of  interest,  and  rejecting  the 
remainder.  The  feature  decomposition  should  be 
designed  to  facilitate  this  selection. 

•  Information  describing  the  geometric  relationship 
between  the  sensor  and  the  information  of  interest, 
not  the  information  itself.  For  imaging  sensors  this 
comprises  range,  bearing  and  3D  orientation  of  the 
objects  being  viewed.  For  acoustic  and  radar  sensors 
this  comprises  range,  time  of  arrival,  multipath, 
Doppler  shift  and  possibly  other  effects  such  as 
distortion.  These  variables  are  sometimes  called 
nuisance  parameters.  We  aim  to  reject  this 
information  by  constructing  canonical  forms 
(Section  6)  which  do  not  depend  on  this  geometrical 
relationship. 

We  thus  conceptually  segregate  the  signal  or  image  data 
into  information  of  interest,  the  foreground,  and  the 
remainder,  the  background.  The  extraction  of 
information  from  the  data  is  now  analogous  to  the 
detection  and  characterisation  of  the  foreground  and  the 
rejection  of  the  background,  where  a  detection  occurs 
when  a  foreground  feature  is  extracted  from  the  data. 
The  extracted  features  can  therefore  be  interpreted  as  the 
output  from  a  set,  or  dictionary,  of  detection  filters 
applied  to  the  signal  or  image.  This  dictionary  depends 
on  the  application;  the  foreground  and  background 
should  be  easy  to  segregate  based  on  the  outputs  of  the 
filters  it  comprises.  The  Fourier  dictionary,  sines  and 
cosines  (or  complex  exponentials),  is  suitable  for 
foregrounds  with  strong  harmonics,  and  has  the 
additional  advantage  that  the  amplitudes  of  the  Fourier 
coefficients  are  translation-invariant.  The  wavelet 
dictionary  comprises  all  translations  and  dilations  of  the 
mother  wavelet,  and  so  is  suitable  if  the  position  and 
scale  of  the  foreground  objects  in  the  signal  or  image  are 
not  known  a  priori. 

In  general,  an  economical  but  complete  representation  of 
the  signal  or  image  is  sought  in  order  that  the 
information  of  interest  is  easily  extracted  but  still  fully 
represented.  A  conventional  approach  to  achieving 
economy  and  completeness  is  to  use  a  fixed  basis  for  the 
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dictionary  so  that  the  component  functions  span  the 
signal  space  and  are  linearly  independent,  as  with  the 
discrete  Fourier  and  wavelet  transforms.  However,  the 
use  of  a  fixed  basis  constrains  the  size  of  the  dictionary, 
limiting  the  scope  and  sensitivity  of  the  signal  or  image 
representation. 

More  recent  developments  have  therefore  considered 
adapting  the  dictionary  functions  separately  to  each 
signal  or  image  rather  than  using  a  fixed  dictionary  for 
all  signals.  In  Coifman  and  Wickerhauser  [3],  the  master 
dictionary  is  a  specially  constructed  library  of 
orthonormal  bases.  An  orthonormal  dictionary  is  adapted 
from  this  library  in  order  to  minimise  the  information 
content  in  the  representation  of  the  signal.  More 
generally  [4,5],  the  master  dictionary  is  non-orthogonal, 
but  a  smaller  subset  is  similarly  adapted  to  the  signal  to 
maximise  the  efficiency  of  the  representation.  This 
approach  is  aimed  at  resolving  the  conflict  between 
flexibility,  where  a  large  dictionary  would  ordinarily  be 
required,  and  economy,  and  has  been  adopted  both  in  the 
Wavelet  Packet  Transform  of  Coifman  et  al  [4]  and  also 
in  the  Matching  Pursuit  algorithm  of  Mallat  et  al  [5], 
where  in  both  cases  the  functions  are  chosen  or 
generated  to  give  the  best  approximation  to  the  signal  or 
structures  within  the  signal. 

This  paper  considers  a  variant  of  the  Matching  Pursuit 
algorithm,  in  which  the  master  dictionary  is  a  manifold 
of  infinitely  many  functions.  Like  Matching  Pursuit,  the 
idea  is  to  generalise  the  wavelet  transform,  so  that  the 
dictionary  is  generated  from  a  mother  wavelet,  but 
includes  transformations  other  than  translation  and 
dilation.  This  enables  the  efficient  and  invariant 
representation  of  a  greater  variety  of  foreground 
information,  for  example  with  variable  orientations, 
affine  transformations,  frequency  shifts  and  chirp  angles. 
Because  the  dictionary  is  generated  by  the  action  of  a 
continuous  group  of  transformations  on  a  mother 
wavelet,  for  convenience  the  method  is  denoted  shape 
adaptive  wavelet  analysis,  the  dictionary  is  called  a 
wavelet  space,  the  constituent  functions  are  called 
wavelets,  and  correlation  of  the  signal  or  image  with 
these  functions  is  called  the  shape  adaptive  wavelet 
transform  (SAWT). 

We  adapt  the  wavelets  to  the  signal  or  image  data  by 
select  only  those  whose  SAWT  are  local  maxima,  in  a 
manner  analogous  to  Mallat  and  Hwang  [6].  We  show 
that  these  functions  have  three  useful  properties: 

•  They  are  the  best  local  approximations  to  the  signal 
or  image  [6],  thus  enhancing  the  quality  of  the 
representation  and  reconstruction  of  the  original 
data  (Section  2.4). 


•  The  features  extracted  from  local  maxima  are 
invariant  under  the  transformations  associated  with 
the  wavelet  space,  for  example  translation,  rotation 
and  dilation.  This  property  aids  sensor-invariant 
pattern  recognition  (Section  6). 

The  selected  wavelets  are  not  in  general  orthogonal,  but 
a  method  of  redundancy  removal  is  presented  (Section  5) 
which  increases  parsimony  and  also  simplifies  the 
process  of  reconstructing  the  foreground. 

The  remainder  of  this  paper  is  laid  out  as  follows. 
Section  2  describes  the  underlying  principles  of  the 
wavelet  space,  the  extraction  of  geometric  features  at 
local  maxima  in  the  SAWT,  and  how  these  are  related  to 
matched  filters.  Section  3  gives  an  overview  of  the 
methods  covered  in  this  paper,  including  feature 
extraction,  signal  or  image  reconstruction  and  sensor- 
invariant  representation.  Section  4  gives  details  of 
feature  extraction,  including  the  use  of  Riemannian 
geometry  and  local  co-ordinate  transformations  to 
improve  efficiency,  and  the  use  of  shape-adaptive 
wavelets  for  characterising  foreground  objects.  Section  5 
discusses  redundancy  removal  and  signal  or  image 
reconstruction.  Section  6  describes  how  canonical  forms 
are  constructed  from  the  above  feature  decompositions, 
which  are  independent  of  viewing  aspect  and  related 
sensor  settings,  and  their  use  in  aspect-invariant 
recognition  of  aircraft  viewed  from  an  imaging  sensor. 


2.  Shape  Adaptive  Wavelet  Analysis 

2.1.  Matched  filters  and  dictionaries 

In  this  paper  we  assume  all  signals  and  images  are  real 
or  complex  valued  functions  defined  on  the  real  line  (e.g. 
acoustic  signals),  a  2D  plane  (e.g.  grey-level  images),  or 
a  higher-dimensional  space  (e.g.  image  sequences).  We 
will  develop  the  theory  for  complex  functions,  but  the 
theory  of  real-valued  functions  is  very  similar. 
Expressed  formally,  we  consider  functions  which  are 
members  of  the  Hilbert  space  F„  of  Lebesgue  square- 
integrable  functions  j/:R"  — »  c|  with  the  inner  product 

(/p/2)=  J/i(«)/2(«>*k  ^ 

Rn 

for  any  two  functions  f] , f2  e  Fn  and  usR".  We  are 
interested  in  a  dictionary  of  filters  which  respond  well  to 
foreground  information.  Here  we  consider  linear  filters 
only,  each  of  which  can  be  written  as  the  following  inner 
product: 


•  They  represent  structures  with  locally  maximal  w)  =  J  f{u)w{u)du  (2) 

signal-to-noise  ratio  (SNR),  and  are  thus  well  suited  R 

to  represent  foreground  information  (Sections  2.3 
and  2.4). 
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where  /  e  Fn  is  the  signal  or  image,  w  e  Fu  is  the  filter, 
n  is  the  dimension  of  the  signal  or  image  and  (f,w)  is 

the  output  of  the  filter.  We  use  matched  filter  theory  [7] 
to  derive  linear  filters  w  which  will  optimally  detect  a 
given  foreground  signal /when  embedded  in  background 
noise  with  power  spectral  density  N: 


W(a>)  = 


w 

N(a>) 


(3) 


where  F  and  W  are  the  Fourier  Transforms  of  /  and  w 
respectively.  The  matched  filter  w  is,  however,  as 
specific  in  position,  scale  and  shape  as/.  To  detect  (i.e. 
distinguish  from  the  background)  a  wide  variety  of 
foreground  structures  requires  a  large  dictionary  of 
filters  {w},  so  in  this  paper  we  consider  dictionaries 

with  a  manifold  structure,  where  the  dimension  is 
typically  3  or  more. 


•  The  continuous  Wavelet  Transform  [1,2],  where  the 
wavelets  |w}  are  translated  and  scaled  versions  of 
the  mother  wavelet,  parameterised  by  position  and 
scale: 

wP,,(t)  =  ^ 

•  The  Short  Term  Fourier  (Gabor)  Transform  [1,2], 
where  the  wavelets  are  truncated  sinusoids,  typically 
multiplied  by  a  windowing  function  h,  and  are 
parameterised  by  position  and  frequency: 

Wpf(t)  =  h(t-p)e2m,-p)  (6) 

•  The  Hough  Transform  [9]  for  images,  where  the 
wavelets  are  traditionally  taken  to  be  lines 
parameterised  by  gradient  and  intercept. 


2.2,  The  wavelet  space  and  SAWT 


2.3.  Whitening 


Shape  adaptive  wavelet  analysis  transforms  a  signal  or 
image  into  a  linear  combination  of  dictionary  functions, 
in  common  with  the  Fourier,  Wavelet  and  Wavelet 
Packet  Transforms.  The  dictionary,  called  the  wavelet 
space,  is  a  C2  manifold  (continuously  differentiable  to 
second  order)  of  infinitely  many  functions  called 
wavelets  which  are  chosen  to  enable  us  to  detect 
structures  in  the  signal  or  image  which  provide 
information  of  interest.  Because  the  wavelet  space  is  a 
manifold,  for  each  wavelet  w  in  the  wavelet  space  there 
is  a  chart  (co-ordinate  system)  c  :  W  ->  R"  which  maps 
the  wavelet  to  a  vector  x-c{w )  where  the  elements  in 
this  vector  are  the  wavelet’s  co-ordinates  with  respect  to 
the  chart  c.  We  have  a  lot  of  freedom  of  choice  of  the 
chart,  and  as  we  shall  see  in  Section  4,  carefully 
constructed  charts  benefit  feature  extraction. 

We  use  the  inner  product  given  in  equation  (2)  to  define 
the  following  real-  or  complex-valued  function  on  the 
wavelet  space, 


Tf(w)  =  (/,w> .  (4) 

which  in  the  context  of  this  paper  is  called  the  Shape 
Adaptive  Wavelet  Transform  (SAWT).  This  function, 
analogous  to  the  continuous  wavelet  transform, 
represents,  at  least  in  principle,  the  action  of  the  entire 
dictionary  of  filters  in  the  wavelet  space  W  on  the  signal 
or  image/. 

Examples  of  transforms  and  ‘wavelets’  in  common  use 
are: 


•  The  continuous  Fourier  Transform  [8],  where  the 
‘wavelets’  |vv}  are  (untruncated)  sinusoids 

parameterised  by  frequency. 


Each  of  the  above  transforms  can  be  used  to  optimally 
detect  signal  structures  which  approximate  the  respective 
dictionary  functions  when  embedded  in  uncorrelated 
(white)  noise,  because  for  this  type  of  background  the 
matched  filter  is  identical  to  the  signal  or  image  structure 
being  sought.  Thus  the  Fourier  Transform  is  an  optimal 
detector  of  sinusoids,  the  Wavelet  Transform  is  an 
optimal  multi-resolution  detector  for  the  mother  wavelet, 
the  Gabor  Transform  is  an  optimal  detector  for  truncated 
sinusoids  and  the  Hough  Transform  is  an  optimal 
detector  for  linear  features  in  images.  Where  the 
background  is  correlated  (not  white)  the  wavelet  filters 
|w}  can  be  interpreted  as  the  optimal  detectors  (matched 

filters)  for  the  related  signal  or  image  components  {v} 
given  by: 


V(a)  =  W{a)N{a)  (7) 


where  W  and  V  are  the  Fourier  Transforms  of  w  and  v 
respectively,  and  N  is  the  power  spectral  density  of  the 
noise.  In  practice  it  is  much  easier  to  apply  an  invertible 
spectral  transformation  to  the  signal  or  image  to  whiten 
the  noise  before  applying  the  shape  adaptive  wavelet 
transform.  In  the  Fourier  domain,  such  spectral 
transformations  have  the  form 


F'(co)= 


ZM 

M{co) 


(8) 


where  F  is  the  unwhitened  signal  or  image,  F'  is  the 
whitened  version  of  F  and  where  M  is  chosen  to  satisfy 
\M  (co'f  =  N  (co)  with  N  the  background  noise. 

Henceforth,  we  therefore  assume  that  the  background  is 
pre-whitened. 
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2.4.  SAWT,  SNR  and  approximation  error 

We  now  explain  how  the  wavelet  space  and  the  SAWT 
are  used  to  extract  foreground  information  from  a  given 
signal  or  image.  Suppose  our  given  signal  or  image  is  a 
Gaussian  white  noise  random  process.  For  a  given  point 
(wavelet)  w  in  the  wavelet  space,  the  variation  of  the 
SAWT  at  that  point  will  also  be  Gaussian  with  zero 
mean,  so  the  statistics  of  the  response  are  characterised 
entirely  by  standard  deviation.  This  in  turn  is 
proportional  to  the  L2  norm  of  w: 

ct(a)°c  ^  jjw(u)j2  du  =  ||w||2  (9) 

If  all  the  wavelets  in  the  wavelet  space  have  the  same  L2 
norm,  then  those  wavelets  {w* }  at  the  local  maxima  in 

the  modulus  of  the  SAWT  are  seen  to  represent  the 
signal  components  with  the  locally  highest  SNR,  which 
can  be  interpreted  as  being  most  likely  to  belong  to  the 
‘foreground’  of  interest  and  not  to  the  ‘background’  if 
the  statistical  distribution  of  the  SNR  values  is  being 
considered.  We  therefore  normalise  all  the  wavelets  in 
the  wavelet  spaces  considered  in  this  paper  to  unit  L2 
norm.  This  is  why  there  is  a  scale  factor  j  dividing  the 
conventional  position-scale  wavelet  transform. 

Not  only  do  the  local  maxima  represent  signal 
components  most  likely  to  be  foreground,  but,  as  we 
show  next,  they  are  also  the  best  local  approximations  to 
the  signal  data.  In  considering  such  approximations  we 
need  to  define  an  appropriate  measure  of  approximation 
error.  We  require  the  approximation  of  the  signal 
represented  by  the  wavelet  expansion  to  be  stable  with 
respect  to  the  background.  Differences  between  signals 
should  therefore  be  measured  in  terms  of  their  likelihood 
of  arising  at  random  from  the  background,  so  that 
“small”  errors  can  be  disregarded  as  being  of  little 
significance.  In  the  case  of  Gaussian  white  noise  the  l2 
norm  of  the  difference  between  two  signals  is  such  a 
measure  of  error,  and  the  one  that  we  will  adopt  in  the 
remainder  of  this  paper.  For  correlated  backgrounds  the 
approximation  error  is  derived  by  whitening  the 
difference  between  the  signals  prior  to  the  calculation  of 
the  L2  norm.  This  error  is  also  the  maximum  signal-to- 
noise  ratio  of  the  residual  when  regarded  as  a  signal 
embedded  in  unit  energy  noise.  Suppose  any  filter  w  is 
employed  to  detect  the  residual  /.  Then  the  signal  to 
noise  ratio  is  given  by: 

SNR  =  |/|  ■  0°) 

HI  1  " 

where  we  have  used  the  Cauchy-Schwarz  inequality. 
Thus  the  L2  norm  is  a  measure  of  the  difficulty  of 
detection  of  the  residual  using  a  matched  linear  filter.  If 
this  quantity  is  small,  it  is  natural  to  regard  the  residual 
as  being  small,  and  the  signals  as  being  similar. 


2.5.  Approximation  with  the  SAWT 

We  now  show  that  the  best  local  approximation  to  the 
signal  or  image  is  given  by  the  wavelet  at  a  local 
maximum  in  the  SAWT  multiplied  by  the  value  of  the 
SAWT.  First  we  consider  the  best  approximation  of  the 
signal  /  with  a  wavelet  w  multiplied  by  the  complex 
amplitude  A,  where  the  approximation  error  is  given  by: 

Ef(w,  A)  =  J|/(m)  —  Aw(u)f  du  ■  (11) 

R” 

We  temporarily  fix  w  and  consider  the  minimum  of 
EAw,A )  with  respect  to  A. 

Expanding  equation  (11)  we  obtain: 

Ef{w,  A)  =  \\f(2  +\A\2\\wf2  -2Re(AT/(w))-  (12) 

We  recall  ||vi/||  =  1 ,  because  the  wavelets  are  normalised 
to  unit  L2  norm,  and  we  write  A  =  ae'6  and  T  (w)  =  be . 
Equation  (12)  then  reduces  to 

Ef(w,A)  =  \\f(2  +|A|2  -2abcos{9-  (f>)  ■  (13) 

We  require  a  minimum  of  Ef(w,  A)  with  respect  to  both 
a  and  9. 

For  9,  we  see  by  inspection  that  6  =  <f)(±2nn)  and  (13) 
then  becomes 

Ef(w,A)  =  \\ff2+a2-2ab.  (14) 

For  a,  we  partially  differentiate  Ef(w,A)  with  respect 
to  a  and  equate  to  zero  to  get: 

a  =  b\  A  =  7}(w);  £/(w,A)  =  ||/||2-|7’/(w)|2.(15) 

Thus  the  local  minima  of  Ef(w,A)  occur  at  the  same 

locations  in  the  wavelet  space  as  the  local  maxima  of  the 
modulus  of  the  SAWT,  and  the  optimal  amplitude  A  is 
then  given  by  the  SAWT  value  at  this  point. 

The  equivalence  between  local  maxima  of  the  SAWT 
and  local  minima  of  the  error  measure  is  of  considerable 
benefit  when  searching  for  local  approximations  of  the 
signal.  Our  local  approximation  to/is  given  by  Aw,  so  to 
find  the  best  wavelet  Aw  directly,  we  would  need  to 
search  over  the  whole  space  CxW ;  however,  the 
equivalence  of  local  maxima  of  the  SAWT  and  local 
minima  of  the  error  measure  allows  us  to  search  merely 
for  the  best  wavelet  w.  The  amplitude  A  is  then  given  by 
the  value  of  the  SAWT  at  this  point.  We  can  thus  search 
over  a  space  with  dimension  dim(W)  instead  of 
2  +  dim(W),  and  as  search  times  increase  exponentially 
with  dimension,  this  is  of  considerable  benefit. 


\ 
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3.  An  overview  of  the  method 

The  first  stage  in  shape  adaptive  wavelet  analysis  is  the 
search  for  local  maxima  of  the  SAWT.  This  search  is 
conducted  as  follows: 

•  Conduct  a  grid  search  for  local  maxima  of  the 
SAWT  over  a  low-dimensional  sub-manifold  of  the 
wavelet  space. 

•  Locally  optimise  in  the  full  wavelet  space  each  of 
the  local  maxima  found  above. 

•  Record  the  amplitudes  and  positions  in  the  wavelet 
space  of  the  local  maxima  as  discrete  features,  for 
application-specific  analysis. 

•  If  background  rejection  is  required,  select  only  the 
features  that  belong  to  the  foreground.  With  L2 
normalisation,  the  simplest  method  is  to  set  a 
threshold  on  the  SAWT;  this  is  best  for  Gaussian 
noise.  For  more  intermittent  backgrounds  the 
selection  criteria  are  more  complex,  and  covered  in 
[10]. 

•  If  sensor-invariant  recognition  is  required,  convert 
the  discrete  features  into  a  canonical  form  in  which 
‘nuisance  parameters’  are  removed. 

We  describe  the  first  two  stages  in  detail  in  Section  4 
and  show  how  consideration  of  the  geometry  of  the 
wavelet  space  enhances  the  efficiency  of  the  search.  The 
fifth  stage  is  covered  in  Section  6.  If  accurate 
reconstruction  of  the  signal  or  image  data  is  required,  the 
following  additional  stages  are  also  involved  (Section  5): 

•  Redundancy  removal.  The  wavelets  at  the  local 
maxima  in  the  SAWT  are  not  always  orthogonal,  so 
in  general  there  is  redundancy  in  the  SAWT  at  the 
local  maxima.  Although  redundancy  removal  is 
required  for  accurate  signal  reconstructions,  it  is 
often  not  required  in  further  analysis,  for  example  in 
detection  and  sensor-invariant  recognition. 

•  .  Wavelet  subtraction.  This  method  extends  the 

wavelet  representation  to  represent  the  signal  or 
image  to  any  required  accuracy,  allowing  full 
inversion  of  the  transform  in  the  limit.  This  step  is 
usually  not  necessary  when  background  rejection  is 
involved. 


4.  Feature  extraction 

We  now  describe  how  the  signal  or  image  is  decomposes 
into  a  discrete  set  of  features,  each  a  function  from  the 
wavelet  space  located  at  a  local  maximum  in  the  SAWT. 
There  are  two  stages: 


•  Conduct  a  grid  search  for  local  maxima  of  the 
SAWT  restricted  to  a  low-dimensional  sub-manifold 
V  of  the  wavelet  space  W.  This  sub-manifold  could 
be  the  entire  wavelet  space  if  sufficient  computation 
is  available,  but  in  higher  dimensions  this  is 
generally  not  the  case. 

•  For  each  local  maximum  above  a  given  threshold 
(usually  dictated  by  ambient  noise)  in  the  above  grid 
conduct  a  local  search  for  the  nearest  local  maxima 
of  the  SAWT  in  the  full  space  W. 

This  method,  of  course,  will  not  guarantee  the  detection 
of  all  the  local  maxima  of  the  SAWT,  but  this  guarantee 
would  necessitate  limiting  the  dimension  of  the  wavelet 
space  for  reasons  of  computational  feasibility,  which  is 
no  better.  The  local  search  provides  the  opportunity  of 
improving  the  approximation  to  the  signal  or  image 
without  increasing  the  search  time  exponentially. 

A  truly  global  search  over  any  non-trivial  sub-manifold 
is  not  possible  (unless  the  SAWT  is  band-limited) 
because  there  are  an  infinite  number  of  SAWT  values  to 
calculate;  hence,  a  finite  grid  is  chosen  in  such  a  way 
that  the  SAWT  at  other  points  can  be  predicted 
approximately.  To  do  this  efficiently,  however,  we  first 
endow  the  wavelet  space  with  a  Riemannian  metric  that 
quantifies  the  expected  variability  of  the  SAWT.  We 
then  construct  such  a  metric  and  show  how  it  can  be 
used  to  determine  the  size  of  the  grid  (Section  4.3)  and 
also  aid  the  subsequent  local  optimisation,  considered 
next. 


4.1.  Local  optimisation 


There  are  numerous  “off-the-shelf’  local  optimisation 
methods  reported  in  the  open  literature  [11,12]  and 
implemented  in  computer  software,  whose  effectiveness 
depend  strongly  on  the  type  of  cost  function  to  be 
optimised.  We  take  advantage  of  the  second-order 
differentiability  of  the  wavelet  space  (Section  2)  to  use  a 
differential  search  method  which  is  similar  to  the  method 
of  conjugate  gradients  [11],  but  where  we  take  account 
of  the  above  Riemannian  metric.  Not  only  is  the  SAWT 
smooth,  but  its  partial  derivatives  can  be  computed 
analytically  by  differentiation  of  equation  (2): 


dTf(w) 


f(u)w(u)du)=  Jr„  f(u)^~du 


(16) 


where  the  differentiation  is  done  under  the  integral  sign 
with  respect  to  the  wavelet  co-ordinates  xh  not  the 
independent  variable  u  of  the  wavelet  functions.  This 
method  can  be  more  efficient  than  using  approximation 
by  finite  differences,  although  sometimes  analytic 
differentiation  of  the  integrand  is  messy. 

Local  search  methods  based  on  partial  derivatives  tend 
to  operate  more  efficiently  if  the  variations  in  the  SAWT 
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in  each  direction  are  of  similar  magnitude.  For  example, 
the  conventional  wavelet  transform  applied  to 
1 -dimensional  signals  is  a  function  of  two  variables: 
position  and  scale.  At  small  scales  the  variability  of  this 
function  is  much  greater  with  respect  to  position  than 
scale  and  at  large  scales  the  opposite  is  true.  This 
asymmetric  behaviour  results  in  narrow  ridges  and 
valleys  in  the  cost  function  and  reduces  the  efficiency  of 
search  methods,  which  navigate  these  irregularities  using 
expensive  calculations  of  the  cost  function.  The 
Riemannian  metric,  defined  next,  quantifies  this 
asymmetry,  the  latter  being  co-ordinate  dependent,  so  we 
choose  local  co-ordinates  for  which  this  metric  is 
spherically  symmetric. 

4.2.  Wavelet  space  metrics 
We  define  the  following  metric  d  :  Wx W  — »  [0,1] : 

d(w,,w2)=l  — |(vvj,w2)|.  (17) 

for  all  w,,w2  elf.  We  show  below  that  this  choice  of 
metric  describes  the  variability  of  the  SAWT,  both 
statistically  in  white  noise  and  deterministically.  First  we 
consider  the  deterministic  case.  Let  w\  be  a  wavelet  in 
the  wavelet  space.  We  define  a  metric  ball  B(wve) 
centred  at  this  wavelet  to  be  given  by: 

B(wx,e )  =  {w2  e  W  :  d(wt  ,  w2)  <  e}-  (18) 

Because  |^yj  =  |^/,  eiew^|  for  all  0,  we  can  choose  0 
for  any  particular  choice  of  W\  and  vv2  so  that 

(wp  w2)  =  KWI>  w2)|  '  e'6  •  (19) 

Using  the  Cauchy-Schwarz  inequality  it  then  follows 
that: 

\Tf(wx  )|  -  |7)(w2)||  =  ||(/,  e-wWl)\  -  \(f,  w2)|| 

<|(/,e"'eWi  -w2)| 

(20) 

=  ||/||2l/2-2Re((e''SWi,w2)) 

HI/lW2-2K"2)l 

ll/IL-  • 

Not  only  does  (20)  give  an  upper  bound  on  the 
variability  of  the  SAWT  within  a  metric  ball,  it  also 
shows  that  if  the  wavelets  in  the  wavelet  space  vary 
smoothly  with  respect  to  a  chart  or  co-ordinate  systems, 
then  their  SAWT  values  will  also  vary  smoothly. 

We  now  take  the  SAWT  to  be  real-valued  and  consider 
the  statistical  case.  Let  n  be  a  Gaussian  white  noise 
random  process.  The  difference  in  the  SAWT  values  of 


the  wavelets  vt>,  and  w,  e  B(wl,e)  is  given  by 
(n,wl  -w2)  which  has  a  mean  of  zero  and  a  standard 
deviation  proportional  to  the  L2  norm  of  wx  -  w2  ■  The 
later  is  given  by 

IK  -  w2\\2  =  -^/2  — 2(w,,w2)  (21) 

which  for  1  >  £  >  0  gives  ||Wl  -  w2||2  <  J2e  ■ 

Our  choice  of  metric  is  in  general  expensive  to  compute, 
involving  the  construction  of  the  wavelets  from  their  co¬ 
ordinates  and  then  integration  over  their  common 
support  to  evaluate  the  inner  product.  We  therefore 
employ  a  less  expensive  metric,  which  has  similar 
properties.  We  define  a  Riemannian  metric  [13] 
Ruhdxad: r*  at  the  wavelet  w  to  be  half  the  second  co¬ 
variant  tensor  derivative  of  metric  (17)  where  the  first 
wavelet  is  fixed  at  w,  and  differentiation  is  with  respect 
to  the  second  wavelet.  In  practice,  this  metric  is  a  second 
order  local  approximation  of  (17)  because  the  first 
partial  derivatives  are  zero,  and  thus 

Rttbdx“dx”  =l-\{wx,wIHtx)  +  o(dx2)[  (22) 

4.3.  Wavelet  grids 

Before  explaining  how  to  calculate  Rab  efficiently,  we 
show  how  it  can  be  used  to  construct  global  search  grids. 
Suppose  that  a  grid  consisting  of  wavelets  {vt}  in  the 

sub-manifold  V  <zW  is  chosen  whose  metric  balls 
{^(v^f)}  cover  that  part  of  the  sub-manifold  to  be 

searched.  From  (20),  we  know  that  by  choosing  £  >  0 
small  enough,  we  can  conduct  a  global  search  for  SAWT 
values  above  a  given  threshold,  T  say,  by  evaluating  the 
SAWT  once  in  each  of  the  metric  balls  |B(vt,£)}  and 

testing  against  a  slightly  lower  threshold  of  t( 1  -  V2f  )■ 
The  Riemannian  metric  determines  to  first  order  in  local 
co-ordinates  the  size  of  the  metric  balls  {B(vt,£)},  and 
thus  the  resolution  required  for  the  initial  grid  used  in  the 
global  search.  It  is  therefore  sufficient  to  construct  a  grid 
whose  spacings  dx  are  to  second  order  constrained  by: 

Rubdxadxb  <  £  .  (23) 

4.4.  Local  co-ordinates  and  group  actions 

In  general  the  Hessian  (matrix  of  second  derivatives)  and 
thus  the  Riemannian  tensor  are  expensive  to  compute, 
being  different  at  each  point  in  the  wavelet  space,  but  by 
augmenting  the  manifold  with  additional  group  structure, 
we  can  derive  the  Hessian  at  any  point  in  the  wavelet 
space  from  the  Hessian  at  a  single  point.  The  additional 
structure  comes  by  considering  group  actions  on 
wavelets.  We  now  consider  wavelet  spaces  which  are  the 
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orbit  of  a  single  (mother)  wavelet  w0  (analogous  to  the 
mother  wavelets  used  in  Meyer  [1]), 

W  =  G.w()  =  {g.w():geG}  (24) 

under  a  group  G.  All  the  examples  of  wavelet  spaces 
considered  in  Section  2  have  this  group  structure: 
Fourier  transforms  involve  frequency  shifts,  wavelet 
transforms  involve  translation  and  dilation,  and  Hough 
transforms  involve  translation  and  rotation,  In  order  to 
maintain  the  normalisation  of  the  wavelets,  we  also 
require  that  G  is  isometric,  that  is  it  preserves  inner 
products.  As  we  now  show,  the  group  action  provides  us 
with  a  means  of  deriving  a  local  co-ordinate  system  at 
each  wavelet  w  in  the  wavelet  space  in  which  the 
Riemannian  tensor  at  w  is  identical  to  the  Riemannian 
tensor  at  the  mother  wavelet. 

We  define  the  function  p ;  G  — »  W  from  the  group  to  the 
wavelet  space  by 

p:g\-*g.w0-  (25) 

Because  G.  vv0  spans  the  wavelet  space,  we  can  choose  a 
canonical  function  q:W  — >  G  from  the  wavelet  space  to 
the  group  to  be  a  right  inverse  of  p,  i.e.  p  o  q  is  the 
identity  map  on  W.  We  then  have 

q-.wh^g  where  g.w{)  =  w ,  (26) 

i.e.,  q  is  chosen  to  canonically  map  the  wavelet  w  to  a 
group  element  whose  action  on  the  mother  wavelet 
produces  w. 


=  {p^W^y, ))  P^(w')q(wyi ))) 

=  {^(w)q(wyi)w0,q(w')q(wyJw0)  (29) 

=  {^(Wy)W0^(Wy2)W0) 

=  (w»  •"*,)■ 

We  thus  need  only  evaluate  the  Riemannian  tensor  from 
the  Hessian  matrix  at  one  point  in  the  wavelet  space,  and 
use  equation  (28)  to  derive  this  tensor  at  any  other  point. 
If  we  design  a  local  co-ordinate  system  which  has  “good 
behaviour”  at  any  point,  then  this  co-ordinate 
transformation  provides  us  with  a  means  to  transport  this 
behaviour  to  any  other  wavelet  in  the  wavelet  space.  In 
our  case  (Section  2.4)  we  would  like  the  Riemannian 
metric  to  be  spherically  symmetric,  which  is  done  by 
designing  a  local  chart  for  which  the  Riemannian  tensor 
Rah  is  the  identity  matrix.  It  is  always  possible  to  do  this 
because  the  tensor  is  symmetric  positive  definite,  so  an 
orthogonal  set  of  eigenvectors  can  be  found  and 
normalised  so  that  the  diagonal  elements  of  this  matrix 
are  unity.  If  RUh  has  the  following  diagonal 
decomposition 

R  =  UDUt  (30) 

with  respect  to  the  chart  c2  where  D  is  diagonal  and  U  is 
unitary,  then  the  required  local  co-ordinate  system  c\  is 
given  by  solving 

c2  =  D^Uc^w).  (31) 

4.5.  ID  example:  Acoustic  signals 


For  any  two  wavelets  wj,w2sW  we  consider  the 
following  combination: 

w=  p(q(wt)q(w2)),  (27) 

which  allows  us  to  define  an  isometric  transformation 
between  local  co-ordinate  systems  which  preserves  the 
co-ordinates  of  the  Riemannian  tensor.  Let  c,\ W  — >  R" 
be  a  local  co-ordinate  system  for  which  the  Riemannian 
tensor  is  equal  to  R  at  the  mother  wavelet  w()  and  let 
w'eW  be  any  other  wavelet,  then  the  local  chart  which 
makes  the  co-ordinates  of  the  Riemannian  tensor  equal 
to  R  at  w'  is  given  by 

c  :  w  t-»  c,  °  p{q(w')q(w))-  (28) 

To  see  that  this  transformation  is  isometric,  let 
wx  =c_1(x)  and  Wy  =  c;~'(y)  for  the  two  charts  given 

above,  and  choose  any  two  wavelets  w, ,  w,  e  W  ■  From 
equations  (25)  and  (28)  and  the  isometric  nature  of  G,  it 
follows  that 


To  characterise  acoustic  signals,  we  consider  the  Hilbert 
space  F\  of  1 -dimensional  signals,  a  mother  wavelet 
w(l  e  Fj  and  a  4-dimensional  group  G  of  signal 
transformations  defined  by 


[( P,s,a , 


(32) 


and  parameterised  by  translation  p,  dilation  s,  frequency 
shift  a  and  chirp  shift  b,  where  a  and  b  are  normalised 
with  respect  to  dilation.  Examples  of  these  wavelets  are 
shown  in  Fig.  1.  Group  multiplication  is  defined  to  be 
the  composition  of  these  mappings,  and  in  the  above  co¬ 
ordinate  system  is  given  by: 
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(33) 


The  wavelet  space  W  is  defined  to  be  the  orbit  of  Wo 
when  acted  upon  by  G,  and  is  thus  the  set  of  translations, 
dilations,  frequency  and  chirp  shifts  of  the  mother 
wavelet  w().  These  functions  are  sometimes  called 
chirplets.  In  this  particular  case  there  is  a  one-to-one 
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correspondence  between  G  and  W ,  so  it  is  possible  to 
identify  each  wavelet  with  its  group  action  on  the  mother 
wavelet,  and  use  the  same  co-ordinate  system  for  both. 

This  chirp  wavelet  space  W  =  G.w0  is  a  superset  of  the 
dictionaries  used  in  the  Fourier,  Wavelet  and  Gabor 
Transforms.  Any  section  of  the  SAWT  restricted  to 
constant  a  and  b  is  a  conventional  Wavelet  Transform 
parameterised  by  position  p  and  scale  s.  The  parameters 
a  and  b  determine  the  shape  of  the  mother  wavelet  in 
each  of  these  sub-manifolds. 


The  initial  search  for  local  maxima  in  the  SAWT  is 
restricted  to  the  3-dimensional  sub-manifold  where  the 
chirp  angle  b  is  zero.  This  subspace  can  be 
parameterised  by  position,  scale  and  ordinary  frequency 
a  =  a  I  s .  The  Riemannian  tensor  with  respect  to  these 
co-ordinates  is  given  by: 


As 


R  = 
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0 

0 
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(34) 


where  A,B,C  are  constants  which  depend  on  the  mother 
wavelet.  An  approximately  Gaussian  envelope  was 
chosen  for  the  mother  wavelet  which  is  of  compact 
support  and  continuously  second  differentiable,  ensuring 
that  the  SAWT  is  equally  smooth.  We  require  a  search 
grid  for  which  the  Riemannian  metric  is  as  even  as 
possible  for  an  even  coverage  of  the  wavelet  space. 
Equations  (20)  and  (23)  imply  that  the  spacings  dp  and 
ds  should  be  proportional  to  scale  s  and  the  spacing  da 
should  be  inversely  proportional  to  scale  where  the 
constants  of  proportionality  are  given  by  the  reciprocals 
of  the  square  roots  of  A,B  and  C.  A  grid  meeting  these 
requirements  is  obtained  by  dividing  the  sub-manifold 
into  sections  of  constant  scale,  and  in  each  section 
defining  a  regular  rectangular  grid  where  the  spacing  in 
position  is  proportional  to  scale  and  that  of  frequency 
inversely  proportional  to  scale,  so  that  different  grids  are 
used  at  different  scales.  The  scales  themselves  need  to  be 
a  geometric  sequence  to  generate  the  correct  spacings  ds. 

The  SAWT  at  each  fixed  scale,  essentially  a  Gabor 
Transform,  is  computed  efficiently  as  a  set  of  windowed 
FFT’s.  With  a  Gaussian  envelope  a  low  resolution  in 
scale  is  required,  with  powers  of  2  usually  being 
sufficient,  in  which  case  a  scale  range  of  1000:1  is 
covered  with  only  10  scale  intervals  and  11  Gabor 
Transforms.  The  complex  exponential  wavelets  are 
better  suited  to  characterising  an  analytic  signal,  where 
there  is  no  energy  at  negative  frequencies  in  the  Fourier 
Transform,  than  a  real-valued  signal,  which  has  equal 
energy  at  positive  and  negative  frequencies.  For  real¬ 
valued  signals  the  Hilbert  Transform  is  applied  first  to 
create  an  analytic  signal,  where  sinusoids  are  converted 
to  complex  exponentials. 


The  use  of  complex  exponentials  in  the  wavelet  has  a 
number  of  advantages  over  real  sinusoids.  The  phase  of 
the  sinusoid  is  controlled  by  wavelet  amplitude  and  so  is 
not  required  as  an  additional  parameter  in  the  search 
space,  making  the  search  more  efficient.  A  single 
wavelet  in  the  signal  data  results  in  a  single  local 
maximum  in  the  SAWT  in  the  complex  case,  whereas 
there  are  usually  many  local  maxima  in  the  real  case, 
caused  by  the  sinusoid  in  the  signal  being  alternatively  in 
and  out  of  phase  with  the  filter.  The  lack  of  "side-lobes" 
in  the  complex  case  greatly  reduces  the  redundancy  of 
the  wavelet  characterisation.  The  smoother  SAWT  also 
enables  the  local  search  for  local  maxima  to  work  more 
efficiently.  The  group  G  is  also  easier  to  define  in  the 
complex  case. 

Fig.  2  demonstrates  the  chirp  wavelet  representation  on 
an  example  of  an  underwater  acoustic  signal  attributed  to 
dolphins.  Fig  2a  shows  a  spectrogram  (Gabor 
Transform)  of  the  signal  data,  showing  the  variation  of 
signal  energy  with  time  and  frequency,  in  which  several 
coherent  structures  can  be  perceived  against  a  noisy 
background.  Fig  2b  shows  the  corresponding 
spectrogram  of  a  signal  reconstruction  from  a  chirp 
wavelet  decomposition,  which  has  extracted  most  of  the 
interesting  structure.  Only  15  chirp  wavelets  are  used  in 
the  characterisation,  demonstrating  the  economy  of  the 
method.  The  wavelets  cover  a  wide  range  of  frequency, 
scale  and  chirp  angle,  and  it  would  not  be  possible  to 
characterise  this  data  as  efficiently  with  more  limited 
dictionaries  such  as  the  Gabor  or  Wavelet  dictionaries. 

4.6.  2D  example:  Sea  surface  imagery 


To  characterise  2D  imagery,  we  consider  the  Hilbert 
space  F2  of  images,  and  we  take  our  wavelet  space  W  to 
consist  of  the  Gaussian  ellipses: 


w(,.J-.,A«)(M’V)=eXP 
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where 


cos#  —  sin#Yw  —  x 


sin#  cos#  1  v  —  y 


(35) 


Our  wavelets  thus  have  a  chart  to  R5 ,  where  (x,  y)  is  the 
position  of  the  centre  of  the  wavelet  relative  to  some 
origin,  s  is  the  scale  of  the  wavelet,  a  is  its  aspect  ratio 
and  #  is  its  orientation  with  respect  to  the  y-axis.  We 
choose  the  constant  a{]  to  be  larger  than  unity  to  remove 
a  co-ordinate  system  singularity  which  occurs  because  of 
the  lack  of  effect  rotation  has  upon  wavelets  with  aspect 
ratio  one. 


The  natural  choice  of  image  transformations  for  this 
wavelet  space  would  contain  translation  (for  position), 
dilation  (for  scale),  rotation  (for  orientation)  and 
stretching  (for  aspect  ratio);  however,  the  smallest  group 
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containing  these  transformations  is  the  affine  group, 
which  is  six-dimensional,  ,  and  includes  skew 
transformations.  We  parameterise  this  group  G  as 
{( A,b )}  where  A  is  a  2x2  matrix  composed  of  rotation, 
dilation,  stretching  and  skew  transformations,  and  where 
b  is  a  column  vector  representing  translation.  We  define 
group  multiplication  to  be  given  by 

(A ,  bi  X  A2  ,b2 )  =  (A, . A2 ,  A*2  +  b> ) ■  (36) 

and  the  inner  product  preserving  group  action  on  F2  to  be 
given  by 

[(A,  b)w\u)=  .-i—  ■  w(a~‘  («  -  b)\  (37) 

Vdet  (A) 


for  all  we  F2. 

Unlike  the  previous  example,  the  parameter  spaces  of  the 
wavelet  space  and  the  group  can  now  no  longer  be 
directly  identified  with  one  another  by  judicious  choice 
of  charts.  Indeed,  while  the  group  G  can  only  be  charted 
to  6-dimensional  Euclidean  space,  the  wavelet  space  W 
has  been  charted  to  5-dimensional  space.  This  means 
that  the  co-ordinate  versions  of  the  functions  p:G  — >  W 
and  q :  W  — »  G  used  in  the  chart  transformation  (28)  are 
no  longer  trivial.  We  could  have  removed  this  difficulty 
by  choosing  a  chart  mapping  the  wavelet  space  to  R6 , 
but  this  would  have  increased  the  dimension  and  thus  the 
difficulty  of  searching  for  local  maxima  in  the  SAWT. 

The  initial  search  for  local  maxima  is  restricted  to  a  4- 
dimensional  sub-manifold  with  the  bar  aspect  ratio  fixed 
to  3,  that  is  fairly  broad  bars.  In  most  wavelet  spaces 
four  dimensions  is  too  computationally  demanding,  but 
for  broad  bars  the  SAWT  is  insensitive  to  orientation  and 
scale,  so  only  a  small  number  of  combinations  of  these 
parameters  is  required  in  the  search  grid.  Where  there 
are  long  narrow  objects,  such  as  the  ship  wakes  in  Fig.  2, 
the  local  search  for  unconstrained  local  maxima  in  the 
SAWT  extends  the  broad  bars  along  these  structures. 
The  SAWT  is  much  more  sensitive  to  orientation  for 
narrow  bars  than  broad  bars,  but  this  and  other 
dependencies  on  location  in  the  wavelet  space  are 
removed  by  the  local  co-ordinate  systems  defined  in 
Section  4.4. 

Fig.  3  illustrates  the  use  of  bar  wavelets  to  detect  and 
characterise  ships  and  their  wakes  from  electro-optic 
imagery  of  the  sea  surface.  Both  extended  structures,  the 
wakes,  and  a  more  compact  structure,  the  ship,  are 
characterised  efficiently  by  this  method.  The  ship’s 
course,  including  speed  and  direction,  can  be  accurately 
inferred  from  the  wavelet  parameters,  including  the 
change  in  course  half  way  along  the  wake. 


5.  Removing  redundancy  and  reconstruction 

In  general  the  wavelets  {w*}  detected  as  local  maxima 
in  the  SAWT  are  not  mutually  orthogonal.  This  has  two 
undesirable  effects: 

•  More  wavelets  than  necessary  may  be  required  to 
represent  a  given  signal  or  image  (or  the 
foreground)  with  a  given  level  of  accuracy. 

•  Optimal  reconstruction  is  no  longer  obtained  by  the 
linear  superposition  of  single  wavelet 
reconstructions. 

Traditionally  these  problems  are  avoided  by  designing 
the  wavelet  basis  functions  to  be  orthogonal,  for  example 
using  the  wavelets  of  Daubechies  [2] .  This  approach  is 
fundamentally  incompatible  with  optimisation  in  a 
differentiable  manifold  of  dictionary  functions  because 
there  is  an  open  neighbourhood  B(wve)  of  each 
wavelet  uq  in  which  |(w1,w2)|  is  greater  than  zero.  The 

inherent  flexibility  of  the  local  search  method,  where 
local  approximations  to  the  signal  or  image  are 
optimised,  and  where  there  is  potential  for  fine 
adjustments  to  the  wavelet  parameters,  is  incompatible 
with  the  necessary  constraints  required  for  orthogonality. 


5.1.  Gram-Schmidt  orthogonalisation 


We  therefore  remove  the  constraint  of  orthogonality  in 
the  wavelet  decomposition  and  instead  remove 
redundancy  either  by  excluding  wavelets  or  by  adjusting 
the  coefficients  in  the  wavelet  series  expansion,  initially 
given  by  the  SAWT.  The  process  is  based  on  Gram- 
Schmidt  orthogonalisation,  where  a  temporary 
orthonormal  basis  {<?n }  is  constructed  as  linear 

combinations  of  wavelets  extracted  as  local 

maxima  in  the  SAWT.  The  new  basis  functions  are 
given  by: 


E„  =  w«-X(w..  •«*)«* 


(38) 


where/is  the  signal  or  image  data.  Instead  of  calculating 
the  signal  or  image  functions  {£,}  anc*  {e„}  directly, 
we  calculate  recursively  the  coefficients  of  the  linear 
combinations  of  {vvA}  from  which  they  are  constructed 

using  equation  (38). 


Unlike  ordinary  Gram-Schmidt  orthogonalisation,  the 
order  in  which  the  wavelets  {w*}  are  chosen  is  not  fixed 

but  is  adapted  to  maximise  the  rate  of  decrease  of  the 
residual  sum-of-squares  error.  This  is  achieved  by 
choosing  wn  to  maximise  the  inner  product: 


(39) 


which  in  turn  minimises  the  residual  sum-of-squares 
error  given  by: 


-/*•  (40) 


The  process  stops  when  the  values  of  I  for  all  the 
remaining  wavelets  are  below  the  amplitude 

threshold  for  unusual  objects,  that  is  when  none  of  the 
wavelets  contributes  sufficient  information  about  the 
signal  data  /  which  is  independent  of  the  wavelets 
already  included  in  the  expansion.  The  value  /  for  each 
remaining  candidate  wavelet  wn  is  derived  from  the  inner 
products  (f,wn)  and  (wm,wn'j,  where  the  former  is 

given  by  the  SAWT  (4)  and  the  latter  are  independent  of 
the  signal  and  depend  solely  on  the  wavelet  parameters. 
The  following  recurrence  relations  are  used  to  derive  /: 


6.  Sensor  invariant  representation 

We  now  consider  how  to  represent  foreground 
information,  with  the  aid  of  the  wavelet  features 
extracted  using  shape  adaptive  wavelet  analysis,  in  a 
manner  which  depends  as  little  as  possible  on  the 
relationship  between  the  objects  being  viewed  and  the 
sensor.  A  frequently  encountered  example  of  this  [14]  is 
the  dependence  of  the  relative  3D  position  and 
orientation  of  the  object  and  an  imaging  sensor.  This 
example  will  be  considered  in  this  paper  to  illustrate  the 
method.  Another  example  is  the  effect  of  range  and 
relative  velocity  on  acoustic  data,  where  range  affects 
amplitude  and  timing  of  acoustic  transients,  and  changes 
in  velocity  cause  a  Doppler  frequency  shift. 

6.1.  Sensor-related  transformations 


n- i 
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k=  1 


(41) 


(42) 


5.2.  Wavelet  subtraction 

The  above  wavelet  expansion  is  generally  a  good 
approximation  to  the  signal  or  image,  but  is  not  usually 
an  exact  representation,  and  therefore  not  in  general 
invertible.  In  cases  in  which  the  background  is  not  of 
interest  and  its  rejection  is  required,  the  incompleteness 
of  the  representation  is  usually  not  a  drawback  as  the 
residual  error  is  dominated  by  information  which  is  not 
of  interest.  In  situations  where  the  accuracy  of  the 
wavelet  reconstruction  is  not  sufficient,  however,  a 
process  called  wavelet  subtraction  is  employed  which 
adds  further  wavelets  to  the  expansion  to  achieve  an 
arbitrarily  small  residual. 

The  wavelet  subtraction  process  starts  with  a  wavelet 
expansion  from  local  maxima  in  the  original  SAWT  with 
redundancy  removed  using  Gram-Schmidt 
orthogonalisation.  The  signal  reconstruction  from  this 
expansion  is  then  subtracted  from  the  original  signal  to 
leave  the  residual  fr.  A  new  wavelet  expansion  is  then 
computed  for  the  residual  fr,  in  which  local  maxima  of  a 
new  SAWT  are  computed.  The  new  wavelets  are  then 
added  to  the  original  set  and  redundancy  removal  is 
repeated  for  this  union.  This  process  is  repeated  until  the 
required  level  of  accuracy  is  obtained. 


In  general  we  represent  sensor-related  effects  as  a  set  of 
possible  transformations  G  acting  on  the  signal  or  image 
data.  The  objective  is  to  derive  from  the  wavelet  features 
a  set  of  numbers  which  does  not  change  under  the  action 
of  any  transformation  in  G,  but  from  which  all  other 
attributes  of  the  ensemble  of  features  can  be  determined 
uniquely.  If  this  objective  can  be  met,  these  numbers  can 
be  used  to  classify  objects  in  a  sensor-invariant  manner. 
Given  a  set  of  image  features  {w(.  J ,  that  represent  some 
real-world  object,  and  a  transformation  geG  ,  we  can 
form  another  set  of  features  \g.wi  ,  where  g.w,  denotes 

the  action  of  the  transformation  g  on  the  wavelet  feature 
Wj.  (We  encountered  a  similar  situation  in  Section  4.4, 
where  the  wavelet  space  itself  was  generated  as  the  orbit 
of  a  group  of  transformations  acting  on  the  mother 
wavelet.)  These  feature  sets  are  considered  equivalent 
because  they  may  represent  the  same  target  from  a 
different  viewpoint. 


6.2.  Canonical  forms  and  alignment  functions 

Wavelet  sets  which  are  linked  in  this  way  by  a 
transformation  in  G  form  an  equivalence  class,  and  our 
approach  is  to  define  a  unique  member  for  each  such 
equivalence  class,  which  we  call  a  canonical  form.  The 
parameters  of  the  image  features  in  the  canonical  form 
are  the  sensor-invariant  numbers  we  require.  Our  aim  is 
to  derive  this  canonical  form  from  any  wavelet  set 
belonging  to  the  same  equivalence  class,  because  then 
we  can  characterise  the  real-world  object  in  the  same 
way,  no  matter  what  the  sensor  configuration  is  when  the 
object  is  detected.  We  do  this  by  finding  an  element  g  in 

G  which  will  transform  any  given  feature  set  {wkj  to 
the  canonical  form  {g.w,  J  .  The  transformation 

g  will  depend  on  the  set  {vy  ^  but  will  transform 
equivalent  feature  sets  to  the  same  canonical  form  £.  }■  . 
To  find  g ,  we  introduce  an  alignment  function  m  that 
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returns  a  vector  of  measurements  for  a  given  set  of 
features, 


f(wl,...,wn)=w'i 


(49) 


x  =  (xt . xN) 


(43) 


where  N  is  the  dimension  of  the  transformation  group  G. 
The  purpose  of  the  alignment  function  is  to  provide 
enough  constraints  on  the  feature  set  to  fix  g.  First  we 
take  an  alignment  measurement  of  the  canonical  form, 


i(c,,...,c„)=m0 


(44) 


We  now  find  the  transformation  g  that  satisfies 
m(g.wl,...,g.wn)=mu 


(45) 


This  involves  solving  a  set  of  simultaneous  equations, 
and  we  choose  the  alignment  function  such  these 
equations  are  non-singular,  and  ideally  easy  to  solve,  for 
example  by  being  linear.  The  transformation  g  will  then 
also  satisfy 


\g.wl,...,g.wn}={ci,...,cn} 


(46) 


To  see  that  the  canonical  forms  are  invariant  under  group 
transformation,  suppose  we  have  another  set  of  features 
from  the  same  equivalence  class 


frw, . h-w„ I 


(47) 


The  transformation  which  ensures  that  the  alignment 
function  applied  to  these  features  matches  the  canonical 
alignment  measurement  m0  is  then  gh~] .  When  applied 
to  the  new  features  we  once  again  produce  the  canonical 
set 

\gh-'h.wl,...,gh-'h.w„}  =  {g.w,,...,g.w„} 

r  (48) 

=  IP,,  — ,c„) 


6.3.  Example  alignment  functions 

We  now  consider  three  examples  of  alignment  functions, 
each  requiring  a  choice  of  features  from  the  set 
comprising  the  foreground,  which  we  call  anchor 
features.  The  choice  of  anchor  feature(s)  leaves  an 
ambiguity  in  the  canonical  form;  however,  for  a  small 
number  of  anchor  features  and  a  moderate  number  of 
foreground  features,  the  number  of  permutations  is 
manageable.  In  the  examples  given  in  Section  7.2,  the 
number  of  wavelets  is  of  the  order  5  and  the  number  of 
anchor  wavelets  no  more  than  3,  so  the  number  of 
permutations  typically  20  or  less. 


where  w\  are  the  co-ordinates  of  the  chosen  wavelet  w,  , 
for  some  co-ordinate  system.  We  choose  the  anchor 
wavelet  to  be  the  mother  wavelet.  The  canonical 
alignment  measurement  then  becomes  c[  =  w', .  We  now 
make  use  of  the  mapping  q  defined  in  equation  (26)  of 
Section  4.4  which  associates  group  actions  with 
wavelets.  In  the  case  where  there  is  a  one-to-one 
correspondence  between  the  wavelet  space  and  the  group 
generating  it,  q  is  bijective,  and  the  group  transformation 
which  satisfies  equation  (45)  for  our  particular  alignment 
function  is  ^(w,)’1  .  The  canonical  wavelets  are  then 
produced  through  application  of  <?(w,  )”'  to  each  wavelet 
in  the  set 

)_1 .  w, , . . . ,  )r‘  .w„ }  (50) 

The  values  in  Equation  (50)  represent  the  relationships 
between  each  image  feature  and  the  anchor  feature,  and 
these  relationships,  being  group  quotients,  are  invariant 
under  the  action  of  the  group. 

There  are  many  examples  of  wavelet  group  actions 
where  q  is  bijective,  for  example  any  group  spanned  by 
any  combination  of  translation,  dilation,  and  frequency 
shift,  including  the  wavelet  and  Gabor  transforms.  The 
above  technique  applied  to  features  generated  by  the 
Gabor  transform  is  therefore  useful  for  acoustic  transient 
recognition,  being  invariant  to  time  of  arrival,  range  and 
Doppler  shifts. 

The  next  two  examples  concern  aspect-invariant 
recognition  of  objects  in  3D  space  viewed  at  long  range, 
where  we  follow  the  commonly  adopted  approach  of 
modelling  changes  in  viewing  aspect  as  affine 
transformations  in  the  image  of  the  scene  [14],  We  use 
the  wavelet  space  generated  from  the  affine  group  acting 
on  a  Gaussian  radial  basis  function  (Section  7.2). 
Unfortunately  the  mapping  q  is  not  bijective,  as  there 
affine  group  has  6  dimensions  and  the  wavelet  space 
only  5,  so  the  alignment  measurement  is  insufficient  to 
uniquely  specify  the  transformation  satisfying  equation 
(45).  Instead,  we  require  alignment  functions  which 
return  a  6-element  vector,  as  in  the  remaining  two 
examples. 

The  second  alignment  function  returns  the  positions  of 
three  anchor  wavelets  within  the  set, 

f(w , . w„ )  =  (xK|  ,yWl,xWi,yWi,xW},yvJ.  (51) 


The  first  example  is  applicable  to  any  wavelet 
decomposition  for  which  the  sensor-related 
transformations  are  the  same  as  the  group  generating  the 
wavelet  space  from  the  mother  wavelet  (Section  4.4).  In 
this  case  we  consider  the  alignment  function  which 
returns  the  co-ordinates  of  one  of  the  features  in  the  set: 


With  this  particular  case  it  is  usually  possible  to  identify 
uniquely  the  transformation  satisfying  equation  (45). 
Suppose  that  the  canonical  alignment  measurement  is 
(x ,  yC| ,  xCi ,  y,.; ,  xC) ,  yCj ),  and  the  affine  transformation  to 

be  determined  is  given  by  (a,b,c,d,e, /),  where 
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a  b'j 
n  /7  I 


is  the  composite  matrix  of  scaling,  aspect 


change,  skewness  and  rotation,  and 


Kfj 


is  the 


translation  offset.  The  transformation  is  then  determined 
by  the  linear  system 


(52) 
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If  the  feature  centres  are  not  collinear,  the  system  is  non¬ 
singular  and  therefore  has  a  unique  solution.  Collinearity 
can  be  detected  prior  to  the  derivation  of  the  canonical 
form  and  such  cases  rejected  at  an  earlier  stage  in  the 
recognition  process.  Anchor  features  in  known  target 
types  can  be  chosen  to  be  strongly  non-collinear,  unless 
the  target  has  a  particularly  simple  structure,  in  which 
case  it  should  be  easy  to  recognise  anyway. 


The  third  alignment  function  returns  the  positions  and 
orientations  of  two  anchor  wavelets 

f(w . wn  )=  fa  -  y,  -  .  6»2 )  (53) 


The  transformation  satisfying  equation  (45)  is 
determined  in  a  similar  manner  to  the  previous  example. 
First,  three  sets  of  positions  are  produced  from  the 
measurement,  where  the  first  two  correspond  to  the 
positions  of  the  anchor  wavelets,  and  the  third  set  is  the 
position  of  the  intersection  of  the  major  axes  of  the  two 
anchor  wavelets.  Once  more  there  are  singularities 
within  the  alignment  function,  in  this  case  when  the  two 
anchor  wavelets  are  parallel.  As  in  the  previous  case  in 
most  cases  judicious  choices  of  image  features  in  the 
canonical  form  should  avoid  this  type  of  singularity. 


6.4.  Examples  of  3D  Aspect-Invariant  recognition 

We  now  present  results  for  example  aircraft  appearing  in 
infrared  imagery.  We  first  illustrate  the  uniqueness  of  the 
canonical  forms  for  different  aircraft,  demonstrating 
their  potential  for  class  discrimination.  Fig.  4  shows 
infrared  images  of  the  Sukhoi-30  fighter,  the  B2  Stealth 
Bomber  and  a  transporter  plane.  Fig.  5  shows  a  wavelet 
approximation  of  each  of  the  aircraft.  Notice  how  the 
Sukhoi-30  and  B2  have  been  approximated  by  wavelets 
along  the  edges  of  the  aircraft  outlines,  whereas  the 
transporter  plane  has  had  wavelets  fitted  to  the  main 
fuselage  and  wings.  The  canonical  forms  for  these 
aircraft  are  shown  in  Fig.  6,  and  their  dissimilarity 
clearly  demonstrates  the  potential  ease  with  which 
different  types  of  aircraft  can  be  distinguished. 


In  figures  7  and  8  we  demonstrate  the  invariance  of  the 
canonical  form  to  different  viewing  aspects.  Fig.  7 
shows  four  different  views  of  the  transporter  plane.  In 
Fig.  8  we  show  different  canonical  forms  of  these 
images  superimposed  to  illustrate  their  similarity.  The 
three  wavelets  chosen  to  be  the  anchors  are  the  fuselage, 
the  tail  wing  and  one  of  the  engines.  In  Fig.  8(a)  we 
compare  the  canonical  forms  extracted  from  Fig  7(a)  and 
Fig.  7(b),  and  in  Fig.  8(b)  we  compare  the  canonical 
forms  extracted  from  Fig  7(c)  and  Fig.  7(d).  Most  of  the 
image  feature  parameters  have  changed  very  little, 
indicating  that  a  template  classifier  would  have  little 
difficulty  in  recognising  the  similarity  of  the  aircraft 
from  this  type  of  representation.  This  similarity  has  been 
observed  for  real  data,  in  spite  of  the  limitations  of  the 
affine  transformation  model,  which  does  not  take 
account  of  pixel  aliasing,  occlusion,  non-planarity  and 
sensor  distortion. 

In  figures  9  and  10  we  demonstrate  the  stability  of  the 
canonical  representations  of  the  transporter  plane  to  the 
addition  of  simulated  noise  to  investigate  likely 
performance  caused  either  by  a  degraded  sensor,  or 
atmospheric  obscuration.  There  is  already  some  noise  in 
the  original  image  data,  but  this  data  is  of  higher  quality 
than  could  be  encountered  in  poor  weather.  Each 
canonical  form  been  generated  using  the  second 
alignment  function  of  Section  6.3,  where  the  anchor 
wavelets  are  chosen  to  be  the  fuselage,  the  tail  wing  and 
one  of  the  engines.  Fig.  9(a)  shows  the  image  with 
Gaussian  noise  added  and  Fig  10(a)  compares  the 
canonical  form  generated  by  from  this  image  with  that  of 
the  original  image.  Figs.  9(b)  and  10(b)  show  similar 
results  with  a  lower  SNR,  where  it  can  be  seen  that  even 
in  this  case  the  canonical  forms  are  still  very  similar. 
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(a)  Mother  wavelet 


(b)  Different  position  and  scale 


(c)  Different  frequency  (d)  Different  chirp  angle 


Fig.  1 :  Examples  of  1 D  wavelets 
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Fig.  2a:  Spectrogram  of  underwater 
acoustic  signal 


(a)  Medium  noise 


(b)  High  noise 


Fig.  9:  Transporter  image  with  extra  gaussian  noise 


(a)  Medium  noise  (b)  High  noise 


Fig.  10:  Canonical  forms  with  alignment  function  2 
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Abstract 

Multirate  filter  banks  are  the  fundamental 
building  blocks  of  efficient  wavelet  and  wavelet- 
packet  implementations  and  are  thus  an  important 
part  of  the  current  generation  of  image  compres¬ 
sion  algorithms.  In  addition,  multirate  concepts 
also  form  the  basis  of  local  cosine  transforms 
(LCTs)  and  lapped  orthogonal  transforms  (LOTs) 
which  are  used  in  audio  compression  and  noise 
removal  applications.  In  this  paper,  we  first  dis¬ 
cuss  the  fundamentals  of  multirate  filter  banks, 
both  theory  and  implementation.  Next,  we  focus 
on  applications.  In  particular,  we  highlight 
wavelet-based  image  compression  because  re¬ 
search  in  this  area  has  been  very  productive  in 
recent  years.  We  consider  in  particular  embedded 
algorithms  such  as  embedded  zerotree  wavelet 
(F.7.W)  and  set  partitioning  in  hierarchical  trees 
(SPIHT)  because  of  the  valuable  capabilities  they 
provide  in  a  variety  of  military  applications.  Fi¬ 
nally,  we  also  discuss  additional  areas  in  which 
multirate  filter  banks  play  a  roll  including  inter¬ 
ference  excision,  signal  scrambling,  and  code  or¬ 
thogonal  frequency  division  multiplexing 
(COFDM)  for  data  transmission. 


1 .  Introduction 

The  theory  of  multirate  signal  processing  has 
proven  itself  useful  in  a  variety  of  applications 
over  the  last  ten  years.  It  is  in  communications, 
however,  that  multirate  systems  have  thus  far  had 
the  most  impact.  For  example,  the  transforms 
used  today  in  state-of-the-art  image  and  video 
compression  algorithms  are  themselves  multirate 
systems.  The  discrete  cosine  transform  (DCT) 
used  in  JPEG  and  MPEG  can  be  viewed  as  a 
maximally  decimated  2-dimensional  64-band 
multirate  filter  bank  [1],  [2],  Furthermore,  the 
wavelet  transforms  used  in  the  best  still  image 
compression  algorithms  (such  as  the  upcoming 
JPEG  2000  standard)  are  themselves  implemented 
as  multirate  filter  banks  [3].  Multirate  filter 
banks  are  also  at  the  center  of  all  of  the  existing 
algorithms  for  wideband  audio  compression  [4], 
For  example,  the  popular  MPEG  audio  layers  1, 


2,  and  3  (the  latter  called  MP3  by  many)  all  use  a 
32-band  cosine  modulated  multirate  filter  bank  at 
their  cores.  Other  algorithms  such  as  Dolby  AC- 

3  (now  called  Dolby  Digital)  and  MPEG  AAC 
(advanced  audio  coder)  use  lapped  orthogonal 
transforms  (LOTs)  which  are  simply  a  special 
case  of  the  general  cosine  modulated  filter  bank. 

While  multirate  theory  has  had  its  greatest  im¬ 
pact  on  compression  systems,  it  has  also  influ¬ 
enced  other  areas  of  the  communications  field. 
In  particular,  multirate  systems  form  the  basis  of 
time-frequency  scrambling  methods  for  secure 
voice  communications  [5]  and  they  are  now  also 
being  used  to  generate  broadcast  waveforms  for 
code  orthogonal  frequency  division  multiple  ac¬ 
cess  (COFDM)  [6].  Such  systems  are  currently 
being  used  to  broadcast  digital  radio  in  Europe 
and  are  likely  to  be  used  in  the  future  for  the 
broadcast  of  digital  television  as  well. 

This  paper  is  organized  as  follows.  Section  2 
discuss  the  fundamentals  of  multirate  signal  proc¬ 
essing,  introducing  the  concepts  of  upsamplers 
and  downsamplers.  In  Section  3,  we  introduce 
perfect  reconstruction  filter  banks,  and  in  Section 

4  we  discuss  the  application  of  such  filter  banks 
to  image  compression,  wideband  audio  compres¬ 
sion,  noise  removal,  code  orthogonal  frequency 
division  multiplexing,  and  signal  scrambling. 
Conclusions  are  then  presented  in  Section  5. 


2 .  Fundamentals 

To  describe  a  digital  filter,  we  use  here  three 
different  representations  [7].  In  the  time  (or  spa¬ 
tial)  domain,  a  1-dimensional  (ID)  digital  filter  is 
given  by  its  impulse  response  h(n)  for  n  e  (N„ 
N2).  If  N,  and  N2  are  finite,  then  h(n)  is  a  finite 
impulse  response  (FIR)  filter;  otherwise,  it  is  an 
infinite  impulse  response  (HR)  filter.  We  can  also 
uniquely  characterize  this  filter  in  the  frequency 
domain  by  its  Fourier  transform  as 

H(ej“)  =  Xh(n)e-j“n  (1) 

n 

or  in  the  z-domain  by 
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H(z)  =  Xh(n)z‘n.  (2) 

n 

From  a  practical  standpoint,  we  often  deal  with 
a  discretized  version  of  (1)  called  the  discrete 
Fourier  transform  (DFT).  The  DFT  calculates  the 
coefficients  of  (1)  at  M  sample  points:  i.e., 

M-l 

H(k)  =  X  h(n)e  M  .  (3) 

n=0 

The  two  basic  operations  of  multirate  digital 
signal  processing  are  upsampling  and  downsam¬ 
pling  as  shown  in  Figs.  1  and  2  [8],  To  upsample 
a  digitized  signal  by  a  factor  L,  we  add  L— 1  zeros 
between  each  input  value  to  increase  the  sampling 
rate;  to  downsample  the  signal  by  M,  we  keep 
every  M-th  sample  and  discard  the  rest.  The 
process  of  upsampling  adds  false  high  frequency 
content  to  the  signal  in  the  form  of  images  while 
downsampling  introduces  aliasing  into  the  output 
signal.  Signal  interpolation  is  accomplished  by 
following  the  upsampler  of  Fig.  2  with  a  digital 
lowpass  anti-imaging  filter.  To  eliminate  the  ali¬ 
asing  introduced  by  downsampling,  on  the  other 
hand,  we  must  first  filter  the  input  signal  x(n)  us¬ 
ing  an  appropriate  lowpass  filter.  Suffice  it  to  say 
that  even  if  aliasing  is  eliminated  by  prefiltering 
prior  to  downsampling,  the  lowpass  filtering  op¬ 
eration  reduces  the  information  content  of  the 
resulting  signal.  This  is,  of  course,  completely 
consistent  with  the  theory  of  Nyquist  sampling 
[7],  Finally,  one  notes  from  the  example  in  Fig,  3 
that  any  fractional  sampling  rate  alteration  can  be 
achieved  by  using  the  basic  integer  building 
blocks  shown  in  Figs.  1  and  2. 


Original 


Figure  1:  Downsampling  by  2—  every  other 
sample  is  retained. 


Original 


Figure  2:  Upsampling  by  2—  a  zero  is  in¬ 
serted  between  every  other  sample. 


Figure  3:  Decreases  sampling  rate  by  2/3. 


Figure  4:  2-band  maximally  decimated  filter  bank  (e.g.,  wavelet). 
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3 .  Maximally  decimated  filter  banks 

3.1  2 -band  Systems 

Using  the  multirate  operators  introduced  in 
Section  2,  we  can  now  describe  the  maximally 
decimated  filter  bank.  A  2-band  analy¬ 
sis/synthesis  system  is  shown  in  Fig.  4  where  H„(z) 
is  a  lowpass  filter  and  H,(z)  is  highpass.  Note  that 
because  of  the  downsampling  operations  per¬ 
formed  in  the  analysis  filter  bank,  the  sampling 
density  in  the  transform  or  subband  domain  is 
exactly  the  same  as  it  was  in  the  original  temporal 
domain.  This  is  especially  important  for  com¬ 
pression  applications  because  each  of  the  trans¬ 
form  coefficients  must  be  quantized  and  coded 
(i.e.,  converted  into  bits)  for  transmission.  This 
property  is  also  quite  useful  in  other  applications 
as  well  like  pattern  classification  and  signal 
scrambling  [9],  [5]. 

Using  basic  multirate  identities  and  a  little  bit 
of  algebraic  manipulation  on  the  system  of  Fig.  3, 
we  find  that 

Y(z)  =  i[H0(z)G0(z)  +  H1(z)G1(z)]  •  X(z) 

2  (4) 

+i[H0(-z)G0(z)  +  Hj  (-z)G1  (z)]  ■  X(-z) 

[8].  The  first  term  in  (4)  represents  the  linear 
time  invariant  (LTI)  response  of  the  combined 
analysis/synthesis  system  while  the  second  repre¬ 
sents  the  aliasing  introduced  into  the  system  by 
downsampling.  This  aliasing  can  be  completely 
canceled,  however,  if  we  select 

G0(z)  =  H1(-z),  Gj(z)  =  -Hq(-z).  (5) 

Substituting  (5)  into  (4),  we  see  that 

Y(z)  =  T(z)X(z)  (6) 

where 

T(z)  =  |[H0(z)H1  (  z)  -  Hj^HqC-z)]  (7) 

is  the  LTI  transfer  function  of  the  system.  In 
general,  T(z)  may  introduce  both  amplitude  and 
phase  distortion  into  the  reconstructed  signal.  To 
achieve  perfect  reconstruction,  T(z)  must  have  the 
form  c-z~"°  for  constant  c  and  integer  nO. 

Perfect  reconstruction  can  be  achieved  with  the 
appropriate  choice  of  filters  H„(z)  and  H,(z).  In 
particular,  assume  that  H0(z)  is  power  symmetric. 
This  implies  that 

H0(z)H0(z)  +  Ho(-z)H0(-z)  =  l  (8) 

where  H(z)  =  H*(z-1  j,  indicating  that  the  filter 

coefficients  are  first  time  reversed  and  then  con¬ 
jugated.  To  force  T(z)  to  have  the  desired  form 
c-z'n0  we  can  thus  select 


H1(z)  =  z_NH0(-z).  (9) 

For  odd  N,  (6)  reduces  to  Y(z)  =  0.5z-NX(z)  and 
perfect  reconstruction  is  achieved!  Substituting 
(9)  into  (5),  the  two  synthesis  filters  are  now  given 
by 

G0(z)  =  z_NH0(z),  G1(z)  =  z_NH1(z).  (10) 

Note  that  all  four  filters  in  the  system  are  com¬ 
pletely  determined  by  just  one  of  them!  The  fil¬ 
ter  bank  shown  in  Fig.  3  was  first  called  the  Con¬ 
jugate  Quadrature  Filter  Bank  when  described  by 
Smith  and  Barnwell  in  [10].  It  was  not  truly 
popularized,  however,  until  Daubechies  showed 
that  such  filter  banks  could  be  used  to  build  dis¬ 
crete  orthogonal  wavelet  transforms  [11]. 

While  the  constraint  given  by  (9)  must  be  satis¬ 
fied  if  one  is  to  create  perfect  reconstruction  (PR) 
orthogonal  filter  banks  and  wavelets,  it  does  not 
affect  the  creation  to  create  non-orthogonal  PR 
systems.  In  fact,  if  one  wishes  to  construct  a  lin¬ 
ear  phase  PR  filter  bank  (or,  equivalently,  a  sym¬ 
metric  wavelet),  one  must  give  generally  give  up 
orthogonality  and  equation  (9).  By  doing  so  one 
can  create  instead  a  biorthogonal  system  such  that 

Ho(z)Go(z)  +  H0(-z)G0(-z)  =  l.  (11) 

Two  different  filters  must  now  be  designed  such 
that  (11)  is  satisfied-  the  remaining  two  filters  are 
still  determined  by  (5).  If  we  define  P(z)  = 
H0(z)G„(z),  then  (11)  becomes 

P(z)  +  P(-z)  =  l  (12) 

which  can  be  satisfied  by  a  half-band  or  Nyquist 
filter.  Thus,  one  need  only  design  such  a  filter 
and  then  factor  it  to  create  the  desired  orthogonal 
or  biorthogonal  system  [8]. 


Figure  5:  Polyphase  implementation  of  co¬ 
sine  modulated  M-band  filter  bank. 


FREQUENCY 


Figure  7:  Frequency  response  of  the  8-band  analysis/synthesis  system  whose  analysis  bank  is  given  by  Fig. 
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3.2  M-Band  Systems 

Often  it  is  desirable  to  split  a  signal  into  more 
than  2  frequency  bands.  While  one  can  imple¬ 
ment  an  M-band  decomposition  by  successively 
applying  the  2-band  filter  bank  discussed  in  the 
last  section,  it  is  often  more  efficient  to  implement 
it  directly.  In  fact,  the  most  efficient  maximally 
decimated  M-band  filter  bank  capable  of  achiev¬ 
ing  good  frequency  discrimination  is  the  cosine 
modulated  filter  bank  [8],  Here,  one  designs  a 
single  lowpass  prototype  filter  and  then  imple¬ 
ments  the  remaining  filters  by  using  cosine  wave¬ 
forms  to  modulate  this  prototype  to  higher  fre¬ 
quency  bands.  Figure  5  shows  the  computation¬ 
ally  efficient  polyphase  implementation  of  an  M- 
band  cosine  modulated  filter  bank  (the  powers  of 
z'1  denote  delay  elements  or  shift  registers).  Note 
that  the  filter  coefficients  belonging  to  the  low- 
pass  prototype  filter  H(z)  are  uniformly  distrib¬ 
uted  amongst  the  polyphase  subfilters  Gk(z2). 
Thus,  the  complexity  of  implementing  all  M  fil¬ 
ters  is  equal  to  that  of  the  original  prototype  filter 
H(z)  plus  the  cost  of  implementing  the  Mx2M 
transform  T.  The  elements  of  the  transform  ma¬ 
trix  T  are  given  by 

tkn  =2cosAk  +  0.5)(n--^)  +  ek),  (13) 
M  2 

where  0k  =(-l)k7t/4  and  N  is  the  order  of  H(z). 
This  transform  can  be  implemented  very  effi¬ 
ciently  using  a  2M-point  FFT  plus  a  few  addi¬ 
tional  operations. 


Figure  8:  Relationship  between  LOT  and 
modulated  cosine  filter  bank. 

The  first  filter  banks  designed  along  the  lines 
of  Fig.  5  were  called  pseudo-QMF  or  generalize 
QMF  banks,  and  they  did  not  achieve  perfect  re¬ 
construction.  Instead,  they  only  cancel  aliasing 
between  adjacent  frequency  subbands,  and  they 
introduce  a  small  amount  of  amplitude  distortion 
into  the  reconstructed  output.  Figure  6  shows  the 
analysis  filters  of  an  8-band  pseudo  QMF  bank. 
Note  that  only  the  transition  bands  of  adjacent 
filters  overlap-  the  passbands  and  transition 
bands  of  all  non-adjacent  filters  are  in  the  stop- 
band.  Thus,  it  really  is  sufficient  just  to  cancel 
aliasing  between  adjacent  bands.  Figure  7  shows 
the  frequency  response  of  the  combined  analy¬ 
sis/synthesis  system.  While  there  is  some  ampli¬ 
tude  distortion,  its  magnitude  is  extremely  small. 
Thus,  despite  its  minor  flaws,  the  pseudo-QMF 
bank  has  been  widely  used  in  a  variety  of  appli¬ 
cations,  most  notably  MPEG  (motion  picture  ex¬ 
perts  group)  audio  Layers  1,  2,  and,  3  (MP3). 
More  recently,  various  authors  have  found  ways 
of  designing  the  prototype  filter  H(z)  so  that  per¬ 
fect  reconstruction  can  also  be  achieved  within 
the  framework  of  Fig.  5  [12]. 
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Figure  9:  4-band  2D  multirate  filter  bank-  one  level  of  wavelet  decomposi¬ 
tion. 
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It  should  also  be  noted  that  the  cosine  modu¬ 
lated  filter  bank  of  Fig.  5  can  be  specialized  to 
the  lapped  orthogonal  transform  (LOT)  or  Local 
Cosine  Transform  (LCT).  To  see  this,  consider 
Fig.  8.  Here,  the  polyphase  subfilters  have  been 
replaced  by  single  scalar  coefficients.  These  co¬ 
efficients  are  simply  the  window  weights  required 
to  implement  a  LOT  having  a  50%  overlap  be¬ 
tween  adjacent  windows.  In  the  figure,  the  poly¬ 
phase  network  acts  exactly  like  a  shift  register  in 
which  half  of  the  samples  are  replaced  with  new 
samples  every  clocking  cycle. 


4.  Applications 

4. 1  Wavelet-Based  Image  Compression 

The  current  standards  for  image  and  video 
compression,  JPEG  (joint  photographies  experts 
group)  and  MPEG,  are  both  transform  coders 
centered  around  an  8x8  blocked  DCT  [1],  [2], 
More  recently,  image  compression  algorithms 
based  on  the  wavelet  transform  have  gained 
prominence  in  both  research  and  development 
because  of  their  ability  to  operate  effectively  over 
a  wide  range  of  compression  ratios  [13].  [14].  In 
fact,  a  single  wavelet-based  algorithm  can  be  con¬ 
structed  which  operates  efficiently  for  both 
lossless  and  highly  lossy  compression  [15],  [16]. 
The  basic  2-dimensional  (2D)  wavelet  transform 
used  for  image  compression  is  constructed  by 
cascading  two  ID  filter  banks  of  the  type  shown 
in  Fig.  4  so  that  the  first  one  operates  in  the  verti¬ 
cal  direction  and  the  second  in  the  horizontal  di¬ 
rection.  The  4-band  2D  filter  bank  thus  created 
is  shown  in  Fig.  9  where  the  subscripts  denote  the 
direction  of  operation  of  the  low  (L)  and  high 
(H)  pass  filters.  To  create  a  complete  wavelet 
transform,  we  iterate  the  filter  bank  shown  in  Fig. 
9,  successively  decomposing  the  low-low  band. 
Figure  10  shows  the  subband  or  wavelet  coeffi¬ 
cient  mapping  which  results  from  3  such  itera¬ 
tions.  To  further  illustrate  this  point,  Figure  1 1 
shows  the  wavelet  decomposition  of  an  actual  im¬ 
age. 

The  process  of  wavelet-based  compression  is 
illustrated  in  generic  terms  by  Fig.  12.  First  the 
image  is  transformed  and  then  the  transform  co¬ 
efficients  are  appropriately  quantized  (eliminat¬ 
ing  information  content)  and  coded  (eliminating 
redundancy  and  converting  into  bits).  To  recon¬ 
struct  an  approximation  of  the  image,  we  decode 
the  bit  stream  and  perform  an  inverse  wavelet 
transform  (IDWT)  which  is  simply  the  synthesis 
filter  bank  corresponding  to  the  analysis  bank  of 
Fig.  9. 


Of  particular  interest  is  a  class  of  wavelet-based 
image  compression  algorithms  called  ‘embed¬ 
ded’  coders  [13]-[16],  In  an  embedded  coder,  a 
bit  stream  is  generated  and  transmitted  in  order  of 
importance-  i.e.,  the  most  important  bits  are  sent 
first.  Figure  13  illustrates  this  concept.  Embed¬ 
ded  bit  streams  are  useful  in  a  number  of  ways: 
1)  if  transmission  terminates  prematurely,  we  can 
still  reconstruct  an  image  of  the  best  possible 
quality  with  the  bits  we  received,  2)  unequal  error 
protection  coding  is  easily  achieved  in  a  dynamic 
fashion,  and  3)  communications  channels  with 
fixed  bit  rates  are  easily  supported.  All  of  these 
traits  are  useful  in  military  applications  because 
the  communications  channels  of  interest  are  pri¬ 
marily  RF  and  are  likely  to  be  operated  in  an  un¬ 
stable  and  noisy  environment. 
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Figure  10:  Wavelet  coefficient  or  subband 
mapping. 


Figure  11:  Wavelet  decomposed  image. 
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While  a  number  of  embedded  coding  algo¬ 
rithms  have  been  developed  [13]-[15],  we  de¬ 
scribe  here  the  embedded  zerotree  wavelet  (EZW) 
approach  which  was  the  first  developed  and  is  still 
one  of  the  best  [13].  The  fundamental  observa¬ 
tion  around  which  this  coding  algorithm  is  cen¬ 
tered  is  that  there  is  a  strong  correlation  between 
insignificant  coefficients  at  the  same  spatial  loca¬ 
tions  in  different  wavelet  scales—  i.e.,  if  a  wavelet 
coefficient  at  a  coarser  scale  is  zero,  then  it  is 
more  likely  that  the  corresponding  wavelet  coeffi¬ 
cients  at  finer  scales  will  also  be  zero.  Figure  14 
shows  a  3-level,  2D  wavelet  decomposition  and 
the  links  which  define  a  single  zerotree  structure. 
If  the  wavelet  coefficient  at  a  given  scale  is  zero 
along  with  all  of  its  descendants  (as  shown  in  Fig. 
14),  then  a  special  symbol  indicating  a  zerotree 
root  (ZTR)  is  transmitted,  eliminating  the  need  to 
transmit  the  values  of  the  descendants.  Thus,  the 
correlation  of  insignificance  across  scales  results 
in  a  net  decrease  in  the  number  of  bits  transmit¬ 
ted. 
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Figure  12:  Wavelet  compression  and  decom¬ 
pression. 


Figure  14:  Parent-child  relationships  amongst  zerotree 
coefficients. 
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Figure  13:  Embedded  image  compression. 
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Figure  15:  Encoder  and  decoder  for  an  embedded  image  compression  algorithm. 


In  order  to  generate  an  embedded  code  (where 
information  is  transmitted  in  order  of  impor¬ 
tance),  Shapiro’s  EZW  algorithm  scans  the  wave¬ 
let  coefficients  in  a  bit-plane  fashion.  Starting 
with  a  threshold  determined  from  the  magnitude 
of  the  largest  coefficient,  the  algorithm  sweeps 
through  the  coefficients,  transmitting  the  sign  (+ 
or  -)  if  a  coefficient’s  magnitude  is  greater  than 
the  threshold  (i.e.,  it  is  significant),  a  ZTR  if  it  is 
less  than  the  threshold  but  the  root  of  a  zerotree 
at  the  coarsest  possible  scale,  or  a  0  otherwise— 
this  is  the  dominant  pass.  Next,  for  the  subordi¬ 
nate  pass  all  coefficients  deemed  significant  in  the 
dominant  pass  are  added  to  a  second  subordinate 
list  which  is  itself  scanned,  adding  one  bit  of 
resolution  to  the  decoder’s  representation  of  each 
significant  coefficient.  Symbols  generated  by 
these  two  passes  are  then  passed  through  a  lossless 
arithmetic  coder  to  extract  further  statistical  re¬ 
dundancy.  After  this,  the  threshold  is  halved  and 
the  two  passes  are  repeated  with  those  coefficients 
having  been  found  significant  previously  being 
replaced  by  zeros  in  the  dominant  pass  (so  that 
they  do  not  inhibit  the  formation  of  zerotrees  in 
subsequent  iterations).  The  process  continues 
until  the  bit  budget  is  exhausted;  at  this  point,  the 
encoder  transmits  a  stop  symbol  and  its  operation 
is  terminated. 

The  decoder,  on  the  other  hand,  simply  accepts 
the  bit  stream  coming  from  the  encoder,  pro¬ 
gressively  building  up  the  significance  map  and 


the  subordinate  list  in  the  exact  same  way  as  they 
were  created  by  the  encoder.  Because  of  this  pre¬ 
cise  synchronization,  the  resolution  enhancement 
bits  transmitted  during  the  subordinate  pass  do 
not  need  any  location  specifiers—  the  decoder 
knows  the  exact  transmission  order  of  these  bits 
because  it  has  reconstructed  the  same  subordinate 
list  as  the  encoder  had  at  that  point  in  the  process. 
The  encoding  and  decoding  processes  described 
here  are  illustrated  by  the  block  diagrams  shown 
in  Fig.  15.  Figure  16  illustrates  the  advantages  of 
wavelet-based  compression  over  JPEG  at  high 
compression  ratios. 
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JPEG  CR  =  51:1  Wavelet 


PS  NR  =  23.8  dB  PSNR  =  24.5dB 


Figure  16:  JPEG  versus  EZW  at  a  compression  ratio  of  51:1.  Objective  comparison  metric  is  peak  signal  to 
noise  ratio  (PSNR). 


(a)  (b) 


Figure  17:  (a)  Reconstruction  of  image  compressed  by  160:1  ratio.  Again,  squares  have  been  added  to  highlight  en¬ 
hanced  regions,  (b)  Error  residual  between  reconstructed  and  original  images  where  white  areas  in  residual  denote  large 
errors. 


Figure  18:  (a)  Compressed  with  space-frequency  weighting  by  32:1  ratio;  (b)  Compressed  without  weighting.  Note 
that  our  texture  detection  algorithm  is  designed  to  be  selective-  it  only  identified  the  orchard  in  the  upper  left-hand  cor¬ 
ner  of  the  image  but  not  the  one  in  the  lower  right-hand  corner. 


4-10 


Figure  19:  Generic  audio  encoder. 


One  should  note  that  the  use  of  wavelet  trans¬ 
forms  for  image  (or  video)  compression  does  not 
preclude  spatial  significance  weighting.  In  other 
words,  if  you  know  that  certain  regions  of  the  im¬ 
age  contain  more  (or  less)  important  content,  you 
can  have  the  encoder  allocate  more  (or  less) 
resolution  to  those  areas.  In  fact,  within  the 
framework  of  the  wavelet-decomposed  image,  the 
encoder  can  actually  control  the  allocation  of 
resolution  in  both  space  and  frequency—  within 
certain  constraints,  of  course.  In  [17]  we  com¬ 
bined  an  embedded  image  compression  algo¬ 
rithm  with  a  maximum  average  correlation  height 
(MACH)  feature  detector  to  allocate  resolution 
within  the  image  frame.  Figure  17  shows  the  re¬ 
sult  of  this  feature-based  compression  algorithm 
when  four  different  resolution  weightings  are 
available.  Similarly,  we  can  also  decrease  the 
resolution  of  areas  that  our  coder  determines  to 
be  of  low  interest  (in  order  to  make  more  bits 
available  for  the  rest  of  the  image).  In  Fig.  18,  we 
have  forced  the  encoder  to  discard  higher  fre¬ 
quency  wavelet  coefficients  within  the  orchard  on 
the  assumption  that  the  details  of  the  orchard  are 
not  of  great  interest.  Although  not  obvious  in  the 
figure,  the  areas  outside  the  orchard  are  repre¬ 
sented  with  higher  fidelity  in  Fig.  1 8b  than  1 8a. 

4. 2  Wideband  Audio 

Another  area  in  which  multirate  filter  banks 
have  had  great  success  is  wideband  audio  com¬ 
pression.  Because  of  the  high  dynamic  range  of 
typical  wideband  audio  signals  (e.g.,  classical  mu¬ 
sic),  a  highly  adaptive  decomposition  and  bit  al¬ 
location  framework  (dynamically  adapted  to  the 
signal  power  levels  at  a  given  time  and  within  a 
given  frequency  band)  is  required.  The  basic 
time-frequency  decomposition  used  in  virtually 
every  wideband  audio  coder  generated  is  the  co¬ 
sine  modulated  filter  bank  in  either  the  general 
polyphase  form  of  Fig.  5  or  the  LOT  form  of 
Fig.  6.  To  allocate  bits  to  the  different  frequency 
bands  at  a  given  time,  the  algorithm  uses  knowl¬ 
edge  about  perceptual  masking  in  the  human  ear 
[18]. 

Figure  19  shows  the  block  diagram  of  a  ge¬ 
neric  subband  (transform)  coding  algorithm 
which  can  adapt  its  quantization  (and  possibly  its 
decomposition)  to  optimize  the  perceived  quality 


of  its  reconstructed  audio.  The  dotted  lines  rep¬ 
resent  data  exchanges  which  do  not  occur  in  all 
implementations.  For  example,  all  three  of  the 
MPEG  1  audio  coders  (Layers  1,  2,  and  3)  use  a 
separate  FFT  to  perform  the  frequency  analysis 
required  to  do  the  bit  allocation  (indicating  that 
path  al  is  active)  while  Dolby  AC-3,  Philip’s 
digital  compact  cassette  (DCC),  and  Sony’s 
MiniDisc  use  only  the  outputs  of  their  signal 
decompositions  (path  a2).  The  ‘Perceptual 
Analysis’  block  computes  the  masking  estimates 
which  are  required  by  the  ‘Bit  Allocation’  block 
to  ensure  that  quantization  errors  in  the  recon¬ 
structed  audio  are  inaudible.  Based  on  this  analy¬ 
sis  of  the  signal,  some  audio  coders  also  have  the 
ability  to  alter  their  decompositions  and  the  cor¬ 
responding  coefficient  groupings  (path  b)  to  pre¬ 
vent  the  introduction  of  pre-echoes  into  the  de¬ 
coded  audio.  Note  that  the  decoder  simply  in¬ 
verts  the  operations  of  the  encoder  block  by 
block  to  reconstruct  an  approximation  of  the  in¬ 
put  audio. 

Perceptually  transparent  coding  is  accom¬ 
plished  primarily  by  exploiting  the  various 
masking  properties  of  the  human  ear,  specifically: 
the  absolute  threshold  of  hearing,  simultaneous 
frequency  masking,  forward  (temporal)  masking, 
and  backward  masking.  First,  any  frequency 
component  of  the  signal  whose  power  falls  below 
the  absolute  threshold  of  human  hearing  need  not 
be  transmitted.  This  threshold  is  lowest  between  2 
and  4  kHz  and  goes  up  rapidly  above  15  kHz. 
Next,  if  a  small  amplitude  tonal  signal  occurs  at 
the  same  time  as  a  larger  one  of  similar  fre¬ 
quency,  the  smaller  signal  will  be  masked.  This  is 
called  simultaneous  masking  and  is  specified  in 
terms  of  critical  bands  which  are  defined  on  the 
bark  scale  [18],  These  critical  bands  define  the 
frequency  resolution  of  the  human  auditory  sys¬ 
tem-  from  0  to  500  Hz  there  are  5  uniform  criti¬ 
cal  bands  while  above  500  Hz  the  width  of  each 
band  expands  by  approximately  1/3  per  octave. 
The  effectiveness  of  the  masking  decreases  by 
about  8  dB/  bark  for  critical  bands  above  the 
masker  and  25  dB/bark  for  those  below  it,  and  it 
also  depends  strongly  on  the  tonality  of  the  input 
Since  pure  tones  mask  each  other  much  more  ef¬ 
fectively  than  noise-like  signals.  To  estimate  to¬ 
nality,  the  Spectral  Flatness  Measure  (SFM)~  ba¬ 
sically  the  logarithm  of  the  power  spectrum’s 


Figure  20:  Time-frequency  based  interference  excision.  Original  signal  enters  noisy  communi¬ 
cations  channel  at  left.  Signal  output  (to  the  right)  at  the  top  is  the  bit  stream  produced  if  inter¬ 
ference  is  not  excised  while  the  one  at  the  bottom  has  been  denoised.  Note  the  clear  correlation 
spike  in  the  bottom  signal  (indicating  a  good  signal  lock)  and  the  lack  of  such  a  spike  in  the  up¬ 
per  signal. 
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geometric  mean  divided  by  its  arithmetic  mean— 
is  generally  employed  [19].  Specifically,  a  ratio 
of  the  current  SFM  to  the  SFM  of  a  maximally 
tonal  input  is  used  to  compute  the  tonality  of  the 
current  block  of  samples,  and  this  tonality  coeffi¬ 
cient  biases  the  masking  threshold  upward  for 
highly  tonal  signals  or  downward  for  noise-like 
signals. 

The  final  perceptual  effects  which  must  be 
considered  in  the  design  of  the  coding  algorithm 
are  forward  and  backward  temporal  masking. 
Forward  masking  occurs  when  the  masking  signal 
ends  before  the  masked  signal  begins  while 
backward  masking  is  the  exact  opposite.  Per¬ 
ceptual  studies  have  shown  that  forward  masking 
is  the  more  effective  of  the  two  by  a  wide  margin 
[18].  While  most  of  the  currently  available  cod¬ 
ing  algorithms  claim  to  ‘exploit’  forward  and 
backward  masking,  this  statement  is  somewhat 
misleading.  Explicitly,  they  exploit  simultaneous 
masking  to  achieve  bit  rate  reductions  through 
adaptive  bit  allocation  while  implicitly  exploiting 
forward  masking  to  conceal  the  effects  of  time- 
frequency  blocking  on  the  quantized  coefficients. 
In  other  words,  if  the  masking  signal  contained 
within  the  block  of  coefficients  ends  prematurely, 
the  quantization  noise  will  still  be  concealed.  The 
situation  with  backward  masking,  however,  is  en¬ 
tirely  different  since  this  phenomenon  is  highly 
localized  around  the  leading  edge  of  the  masker. 
If  blocks  of  coefficients  representing  a  fixed 
time-frequency  subdivision  of  the  signal  are 
jointly  coded,  then  it  is  possible  for  pre-echo  to 
be  introduced  into  the  reconstructed  audio  by  the 
occurrence  of  a  large  masker  in  latter  parts  of  a 
block.  Thus,  the  goal  of  the  coding  algorithm  is 
not  so  much  to  exploit  backward  masking  as  to 
compensate  for  its  limitations.  In  fact,  the  entire 
motivation  for  using  temporally  adaptive  trans¬ 
formations  in  the  encoder  (path  b  in  Fig.  19) 
comes  from  the  need  for  increased  time  localiza¬ 
tion  of  the  quantization  errors  during  sharp  at¬ 
tacks  (i.e.,  sudden  increases  in  the  short  time 
power  spectrum  of  the  audio  input). 


4. 3  Time-Frequency  Interference  Excision 

The  same  transforms  we  applied  in  the  last 
section  to  the  rather  non-military  application  of 
wideband  audio  compression  can  also  be  applied 
to  the  problem  of  removing  narrow  band  inter¬ 
ference  from  a  wideband  signal  [20].  To  do  this, 
we  first  transform  the  signal  using  the  LOT  of 
Fig.  6  and  then  analyze  the  frequency  subband, 
looking  for  any  large  concentrations  of  energy. 
Since  our  desired  signal  is  wideband,  its  energy 
will  not  be  particularly  concentrated  in  any  given 
band.  The  energy  from  a  narrowband  interfer¬ 
ence,  on  the  other  hand,  will  be  concentrated  into 
certain  bands,  even  if  its  frequency  is  hopping  or 
chirping  (within  certain  bounds).  Once  a  band 
containing  a  potential  interference  source  is  de¬ 
tected,  we  alter  its  transform  coefficients  so  as  to 
suppress  the  interference  without  introducing  new 
types  of  distortions  into  the  signal.  Note  that  this 
is  more  complicated  than  simply  zeroing  the  of¬ 
fending  coefficients  since  these  zeros  themselves 
can  introduce  false  information  into  the  output 
signal.  Finally,  the  transform  coefficients—  pos¬ 
sibly  modified—  are  inverted,  and  the  denoised 
signal  is  then  sent  on  for  detection  or  further 
processing.  A  block  diagram  of  the  complete 
system  is  shown  in  Fig.  20. 

Superimposed  on  the  excision  system  in  Fig. 
20  is  an  example  illustrating  the  advantages  of 
excision.  The  input  spread  spectrum  signal  enters 
from  the  left  and  is  corrupted  by  noise  and  nar¬ 
rowband  interference  (effects  of  the  communica¬ 
tions  channel).  The  upper  branch  outputting  the 
signal  to  the  right  simply  correlates  the  corrupted 
signal  with  the  original  without  denoising.  Note 
that  the  correlation  peak  is  totally  obscured  by 
noise.  The  signal  coming  out  of  the  lower  corre¬ 
lator,  on  the  other  hand,  has  had  its  interference 
detected  and  removed  using  time-frequency 
methods  [21].  Here,  the  correlation  peak  at  the 
output  is  highly  pronounced,  indicting  that  we 
can  easily  decode  the  information  bits  modulated 
onto  this  waveform. 

One  can  also  use  a  windowed  Fourier  trans¬ 
form  in  this  application.  The  disadvantage  of 
doing  this,  however,  is  summarized  by  the  Balian- 
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Figure  21:  COFDM  transmission  system. 
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Low  Theorem:  it  is  impossible  to  design  a  win¬ 
dowed  Fourier  transform  which  simultaneously 
achieves  good  time  and  frequency  localization 
but  does  not  expand  the  sampling  rate  of  the  sig¬ 
nal  in  the  transform  domain.  Thus,  to  prevent  the 
excision  process  from  introducing  artifacts  into 
the  reconstructed  signal,  one  must  generate  and 
process  approximately  50%  more  transform  do¬ 
main  coefficients  using  a  Fourier  transform  than 
with  the  lapped  orthogonal  cosine  transform. 


4. 4  Code  Orthogonal  Frequency  Division 
Multiplexing  ( COFDM) 

The  basic  idea  of  COFDM  is  to  combine  a 
large  number  of  low  bandwidth  information 
sources  into  a  single  wide  bandwidth  signal  for 
transmission  [6].  This  can  be  accomplished  using 
the  dual  structure  to  the  PR  filter  bank—  the  per¬ 
fect  transmultiplex.  Such  a  system  is  shown  in 
Fig.  21  where  M  narrow  bandwidth  signals  are 
combined  into  one  signal  having  M-times  more 
bandwidth.  Note  that  while  each  of  the  M  input 
signals  is  mostly  confined  to  a  specific  frequency 
band  within  the  combined  signal,  there  is  some 
overlap  because  the  digital  filter  are  not  ideal. 
Despite  this,  if  the  filters  Fk(z)  and  Hk(z)  are  de¬ 
signed  correctly,  all  cross-talk  will  be  cancelled 
and  no  amplitude  or  phase  distortion  will  be  in¬ 
troduced  into  the  output  signals:  i.e.,  yk(n)  =  xk(n) 
for  all  k.  It  is  easily  shown  that  if  filters  Fk(z)  are 
the  synthesis  filters  for  an  M-band  PR  filter  bank 
and  Hk(z)  are  the  corresponding  analysis  filters, 
then  the  outputs  of  Fig.  21  will  exactly  equal  its 
inputs,  resulting  in  a  perfect  transmultiplexer. 

If  the  impulse  responses  of  the  set  of  filters 
{fk(n)}  are  all  mutually  orthogonal  to  each  other, 
then  the  system  illustrated  by  Fig.  21  is  truly 
COFDM.  Note  that  orthogonality  of  the  synthesis 
filter  set  { fk(n) }  also  guarantees  orthogonality  of 
the  analysis  filter  set  { hk(n)} .  If  a  PR  cosine 
modulated  filter  bank  of  the  type  shown  in  Fig.  5 
is  used  here,  then  the  filter  set  { fk(n) }  is  indeed 
orthogonal.  Higher  order  filters  results  in  better 
frequency  confinement  of  the  original  signal 


xk(n)  within  the  composite  signal,  but  they  also 
increase  the  complexity  of  the  system.  To  make 
M  as  large  as  possible  for  a  given  implementation 
complexity,  the  LOT  of  Fig.  6  is  often  used.  This 
results  in  relatively  poor  frequency  localization  of 
a  given  narrowband  input  within  the  broadband 
composite  signal  but  does  not  appear  to  cause 
any  serious  problems  within  the  context  of  this 
application. 

Europe  has  already  selected  this  modulation 
scheme  for  Digital  radio  broadcasting  and  will 
likely  select  it  for  digital  television  as  well.  Why? 
A  major  advantage  is  that  it  is  much  more  robust 
to  multipath  phenomena  and  other  forms  of  fre¬ 
quency  dependent  interference.  Specifically,  one 
can  adjust  the  information  flow  rate  going  into 
each  of  the  M  input  channels  so  that  every  chan¬ 
nel  operates  at  exactly  its  maximum  capacity. 
Thus,  a  channel  that  is  subject  to  a  great  deal  of 
interference  would  have  much  less  data  capacity 
than  one  which  is  not.  The  data  rate  for  each 
channel  can  be  adjusted  by  altering  the  amount 
of  error  correction  used  in  it—  i.e.,  the  low  rate, 
interference  prone  channel  expends  most  of  its 
raw  capacity  on  error  protection  while  the  high 
rate,  interference  free  channel  uses  more  of  its 
capacity  to  transmit  actual  data  bits.  In  some 
cases,  there  might  also  be  a  feedback  path  from 
the  receiver  to  the  transmitter,  allowing  the  system 
to  adapt  to  the  changing  RF  environment.  Note 
that  numerous  variations  on  this  concept  have 
been  proposed  including  wavelet  modulation  and 
discrete  tone  modulations.  Both  of  these  use  the 
same  basic  concept  as  COFDM,  but  with  different 
modulation  kernels  [22 j. 

4.5  Time-Frequency  Scrambling 

It  has  been  shown  in  [5]  how  multirate  filter 
banks  can  be  used  very  effectively  to  implement 
voice  scrambling  systems  based  on  time- 
frequency  permutation.  The  most  general  form 
of  such  a  system  is  shown  in  Fig.  22  where  P(z)  is 
the  permuter  (it  is  a  function  of  z  because  it  has, 
in  general,  memory).  Since  the  subband  coeffi¬ 
cients  are  only  reordered  by  P(z)  and  not,  in  the- 
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Figure  22:  Time-frequency  scrambling. 
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ory,  quantized,  any  perfect  reconstruction  filter 
bank  can  be  used  for  the  analysis  and  synthesis 
operations  in  Fig.  22  without  loss  in  the  quality  of 
the  reconstructed  signal.  If  the  signal  input  to  the 
encoder,  x(n),  has  a  frequency  distribution  which 
is  known  to  the  pirate  (e.g.,  mostly  lowpass),  then 
the  pirate  will  be  able  to  use  this  knowledge  to 
more  easily  break  the  permuter’s  scrambling  al¬ 
gorithm.  This  liability  can  be  eliminated  at  con¬ 
siderably  higher  cost  by  passing  the  permuted 
signal  from  P(z)  through  the  synthesis  bank  and 
transmitting  the  output.  In  this  case,  the  decoder 
must  first  pass  its  received  signal  through  an 
analysis  bank,  process  it  with  the  inverse  per- 
muter,  P(z)-1,  and,  finally,  reconstruct  the  un¬ 
scrambled  signal.  If  the  filter  banks  used  in  the 
process  are  perfect  reconstruction,  the  unscram¬ 
bled  signal  will  still  match  the  original  at  the  end 
of  this  process  (barring  multiplication  roundoff 
and  coefficient  quantization  errors). 

Time-frequency  scrambling  using  maximally 
decimated  multirate  filter  banks  has  a  number  of 
advantageous  properties.  Obviously,  it  obscures 
the  signal  in  both  time  and  frequency,  making  it 
very  difficult  to  decode  it  even  if  your  opponent 
has  a  priori  knowledge  about  the  statistics  of  the 
signal  (at  least  as  long  as  the  filters  and  synchro¬ 
nization  intervals  are  not  know  exactly).  Fur¬ 
thermore,  since  the  filter  bank  is  maximally 
decimated,  the  scrambler  does  not  increase  the  bit 
rate  of  the  signal.  On  the  minus  side,  the  delay 
introduced  by  the  permuter  P(z)  for  any  given 
sample  must  be  limited  to  some  maximum  value 
for  real-time  applications.  With  2-way  voice 
communications,  for  example,  it  becomes  impos¬ 
sible  to  carry  on  a  conversation  if  the  throughput 
delay  or  latency  grows  too  large.  If  little  system 
latency  is  allowed,  the  delay  constraint  on  P(z) 
permits  an  opponent  to  limit  his  key  search  and 
thus  decode  the  data  more  quickly.  Time- 
frequency  scrambling  systems  are  most  useful  for 
protecting  data  whose  importance  is  highly  time 
dependent—  i.e.,  data  which  is  immediately  useful 
but  has  little  long  term  value.  For  other  kinds  of 
sensitive  data,,  such  time-frequency  systems 
probably  do  not  provide  adequate  protection. 

5 .  Conclusions 

We  have  discussed  some  of  the  basic  concepts 
behind  digital  multirate  systems,  focusing  on  the 
area  of  maximally  decimated  filter  banks  and 
wavelets.  In  particular,  we  have  analyzed  the  2- 
band  PR  filter  bank  which  implements  orthogonal 
and  biorthogonal  wavelets.  We  have  also  consid¬ 
ered  the  important  cosine  modulated  filter  bank,  a 
special  case  of  which  is  the  lapped  orthogonal 
transform  or  local  cosine  transform.  Finally,  we 
have  summarized  a  number  of  communications- 
related  applications  for  such  filter  banks  includ¬ 
ing  wavelet-based  image  compression,  wideband 
audio  compression,  noise  excision,  modulation, 


and  scrambling.  Of  course,  many  other  applica¬ 
tions  for  such  multirate  systems  also  exist  in¬ 
cluding  adaptive  filtering,  pattern  recognition, 
and  channel  equalization.  In  short,  multirate 
digital  signal  processing  plays  an  important  role 
in  modern  communications  systems,  and  it  will 
likely  become  even  more  critical  as  digital  con¬ 
nectivity  increases  in  the  future. 
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Summary :  In  most  of  the  requirements  met  in 

situation  assessment,  multisensor  analysis  has  to  be 
able  to  recognize  in  adverse  conditions  one  situation  out 
of  a  set  of  possibilities  concerning  for  instance  either 
localization,  identity,  or  matching  hypotheses.  To  do 
so,  it  uses  measurements  of  more  or  less  doubtful  origin 
and  prior  knowledge  that  is  understood  to  be  often 
poorly  defined,  and  whose  validity  is  moreover  difficult 
to  evaluate  under  real  observation  conditions.  The 
present  synthesis  proposes  a  generic  modeling  of  this 
type  of  information  in  the  framework  of  the  theory  cf 
evidence,  with  closer  attention  being  paid  to  the 
different  natures  of  data  processed  in  common  cases. 
This  modeling  in  then  used  to  elaborate  processing 
methods  able  to  face  specific  problems  that  may  arise 
when  multisensor  systems  are  implemented  to  achieve 
functions  like  detection,  classification,  matching  cf 
ambiguous  observations,  or  tracking.  Crucial  practical 
problems  are  more  specifically  dealt  with,  such  as 
appropriate  combination  processing  and  decision 
making,  management  of  heterogeneous  frames  cf 
discernment,  and  integration  of  contextual  knowledge. 
Furthermore,  the  interest  of  a  global  functional  approach 
at  low  level,  possible  in  that  framework,  is  emphasized. 


1  Data  Uncertainty  in  Multisensor  Systems 

Sensors  are  mainly  associated  in  order  to  get  benefit  cf 
their  complementarity.  Different  kinds  of  advantages  may 
be  expected : 

-  ability  to  face  a  more  important  set  of  situations,  as 
one  sensor  may  be  efficient  while  an  other  one  is  not 
because  of  particular  counter-measures,  physical 
phenomena,  conditions  of  observation,  or  lack  of  suitable 
knowledge  (learning,...) ; 

-  saving  of  time  thanks  to  task  sharing  and  cooperation 
between  specific  functions ; 

-  discrimination  capacity  improvement  as  a  result  cf 
observation  conjunction  when  only  partial  information  is 
locally  available  (classification,  localization,...). 

Consequently,  when  analyzing  a  situation,  the  available 
sensors  are  most  often  used  under  conditions  that  induce 
uncertainties  at  different  levels  : 


-  measurements  may  be  imprecise,  erroneous, 
incomplete,  or  ill-suited  to  the  problem, 

-  observations  may  be  ambiguous,  either  in  space  or  in 
time  (eg.  position,  velocity,  or  feature  measurements 
provided  by  two  different  sensors  are  not  necessarily 
related  to  a  same  object), 

-  prior  knowledge  (generated  by  learning,  models, 
descriptions,  and  so  forth)  may  be  incomplete,  poorly 
defined,  and  especially  more  or  less  representative  cf 
reality,  in  particular  in  light  of  the  varying  context. 

Moreover,  the  disparity  of  the  data  delivered  by  the 
various  sensors,  which  is  intended  to  remedy  the 
individual  insufficiencies  of  each,  requires  a  detailed 
evaluation  of  each  of  them,  based  on  any  exogenous 
information  that  might  characterize  their  pertinence  to  the 
problem  at  hand  and  the  context  investigated,  while  such 
information  is  itself  often  very  subjective  and  imprecise. 

Theories  of  uncertainty  offer  an  attractive  federative 
framework  in  this  context.  But  they  run  up  against  a 
certain  number  of  difficulties  in  practice  :  interpretation 
and  modeling  of  the  available  information  in  appropriate 
theoretical  frameworks,  choice  of  an  association 
architecture  and  combination  rules,  decision  principles  to 
be  adopted,  constraints  concerning  the  speed  and  volume 
of  the  necessary  computations. 

To  provide  solutions  to  these  questions,  we  will  first 
consider  a  generic  problem  in  which  we  attempt  to 
characterize  the  likelihood  of  I  hypotheses  Hi 
theoretically  listed  in  an  exhaustive  and  exclusive  set  E. 
These  hypotheses  may  typically  concern  the  presence  of 
entities,  target  or  navigation  landmark  identities,  vector 
or  target  localization,  or  the  status  of  a  system  or  of  a 
situation. 

Such  a  likelihood  function  may  then  be  integrated  either 
into  : 

-  a  choice  strategy,  to  declare  the  most  likely 
hypothesis  (target  identification,  intelligence,  and  so  on), 

-  a  filtering  process  (such  as  target  tracking  or  navigation 
updating), 

-  a  decision  aid  process  for  implementing  means  of 
analysis,  electronic  warfare,  or  intervention. 
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Monterey,  USA,  9-10  November  1999,  and  published  in  RTO  EN-7. 


5-2 


The  likelihood  functions  we  want  have  to  be  developed 
from  the  data  provided  by  J  sensors  Sj.  Each  of  them  is 
assumed  to  be  associated  with  processes  that  extract  a 
measurement  or  a  set  of  measurements  sj,  pertinent  to 
the  targeted  discrimination  function,  from  the  raw  signals 
or  images  it  generates. 

The  developments  presented  are  conducted  in  the  theory 
of  evidence  framework  [1],  which  provides  the  broadest 
and  best-suited  tools  for  the  interpretation  and  the 
processing  of  the  data  considered.  It  is  also  the  most 
federative  in  terms  of  synergy  between  the  different 
theoretical  frameworks  that  may  be  involved  together. 
Appendix  A  gives  a  few  basic  notions  that  will  be  used 
in  the  following. 

We  first  propose  a  common  solution  to  the  generic 
modeling  problem  introduced  formerly,  that  is  afterwards 
particularized  when  closer  attention  is  paid  to  the 
different  natures  of  data  processed  in  common  cases.  This 
modeling  in  then  used  to  elaborate  suitable  classification 
methods,  thanks  to  appropriate  combination  processing 
and  decision  making.  Furthermore,  specific  problems 
met  in  multisensor  systems  are  considered,  such  as 
management  of  heterogeneous  frames  of  discernment, 
integration  of  contextual  knowledge,  and  matching  of 
ambiguous  observations.  Finally,  the  interest  of  a  global 
functional  approach  at  low  level,  possible  in  that 
framework,  is  emphasized  with  the  implementation  a 
tracking  process  that  manages  directly  discrimination 
features. 


2  Modeling  of  Input  Data 

2.1  Generic  Model 

In  the  framework  of  the  generic  problem  we  will  be 
considering  first,  we  assume  that  each  measurement  sj 
can  be  used  to  generate  I  criteria  Cy,  on  the  basis  of  any 
a  priori  knowledge,  having  values  in  [0,1]  capable  cf 
characterizing  the  likelihood  of  each  hypothesis  Hj.  A 
quality  factor  qy  with  values  in  [0,1]  is  also  associated 
with  each  likelihood  Cy.  Its  purpose  is  to  express  the 
aptitude  of  the  criterion  Cy  to  discriminate  the 
hypothesis  77/  under  the  given  observation  conditions, 
on  the  basis  of  a  dedicated  learning  process  or  exogenous 
knowledge.  This  factor  includes  mainly  the  confidence 
that  can  be  accorded  to  the  validity  of  the  a  priori 
knowledge  used  for  generating  Cy.  As  concerns,  for 
example,  the  representativity  of  a  learning  process  in  a 
varying  context,  it  will  typically  depend  on  the  quality, 
volume,  and  suitability  of  the  available  preliminary  data 
as  regards  the  situation  effectively  met. 

Furthermore,  we  consider  here  the  practical  case  cf 
interest  when  the  criteria  Cy  are  generated  by  separate 
information  channels,  in  agreement  with  the  fact  that 
they  are  characterized  by  different  levels  of  reliability  qy. 
We  also  assume  that  we  are  in  the  most  frequently 


encountered  context  where  the  criteria  Cy  taken 
separately  are  always  at  least  of  refutation  value,  in  the 
sense  that,  when  zero,  this  guarantees  that  the  associated 
hypothesis  Hj  is  not  verified. 

This  leads  to  a  formal  construction  of  the  problem  on  the 
basis  of  two  axioms  [2],  [3] : 

Axiom  1  :  Each  of  the  I*J  pairs  [Cy,qy]  constitutes  a 
distinct  source  of  information  having  the  focal  elements 
Hi,  ~H[,  and  E,  in  which  the  frame  of  discernment  E 
represents  the  set  of  the  /  hypotheses  Hi. 

Axiom  2  :  When  Cy  =  0,  Cy  being  valid  ( qy  -  1),  we 
can  assert  that  Hj  is  not  verified. 

Axiom  1  requires  that  I*J  mass  functions  my(.)  be 
generated  from  the  I*J  respective  pairs  [Cy,qy\.  For 
each,  the  mass  of  focal  elements  Hi,  Hj ,  and  E  is  at  first 
defined  by  the  value  of  the  corresponding  criterion  C(/> 
which  can  be  interpreted  only  in  terms  of  credibility  or 
plausibility  of  Hj.  Axiom  2  then  limits  the  number  of 
allowable  interpretations  to  two.  The  first  interpretation 
leads  to : 

Cry(H  j)  =  0  and  Ply  (Hi)  =  Cy  (2.1) 

and  the  second  to  : 

Cry(Hj)  =  Ply(Hj)  =  Cy  (2.2) 

Then,  including  the  confidence  factor  qy  for  Cy  by 

discounting  at  the  rate  (1  -qy)  provides  the  desired  mass 
function  my(.).  This  leads  to  the  two  possible  models  : 

Model  1  : 

my(Hj)  =  0  (2.3) 

mijt'Hi)  =  qy*(\-Cy)  (2.4) 

my(E)=\-qy*(l-Cjj)  (2.5) 

Model  2  : 

my(Hi)  =  qy*Cy  (2.6) 

myi^Hj)  =  qy*(\-Cy)  (2.7) 

my(E)  =  1  -qy  (2.8) 

A  mass  function  m(.)  synthesizing  all  the  evaluations  is 
then  obtained  by  computing  the  orthogonal  sum  of  the 
different  mass  functions  my(.)  in  the  framework  of  each 
model : 

m(.)  =  0  my(.)  (2.9) 

U 

It  should  be  noted  that  Model  1  is  consonant. 

Furthermore  it  satisfies  the  minimum  of  specificity 

measure  [4], 
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The  practical  determination  of  the  Cy  and  qy  terms  is  of 
course  a  problem  specific  to  the  type  of  application  at 
hand.  The  following  in  this  section  2  provides 
expressions  of  Cy  for  the  different  natures  of  data 
processed  in  common  situations.  The  determination  cf 
qij  is  discussed  in  sections  3  and  5. 

2.2  Model  With  Statistical  Learning 

We  have  to  consider  different  kinds  of  relation  between 
the  sensor  observations  and  the  discrimination  features 
that  are  characterized  by  previous  learning. 

2.2.1  Precise  and  Reliable  Observation 

The  problem  dealt  with  here  assumes  that  each  of  the 
measurements  sj  is  directly  one  of  the  discrimination 
features  exploited  or  a  deterministic  function  of  it,  so  that 
for  each  of  them  a  learning  of  their  a  priori  probability 
distribution  p(sj/Hi),  under  the  various  hypotheses  77/,  is 
available.  Most  systems  do  in  fact  allow  a  certain 
number  of  preliminary  measurements  in  different  real  or 
simulated  situations,  from  which  histograms  can  be 
generated  to  get  a  numerical  or  analytical  model  of  the 
distributions  p(sj/Hj).  The  1*J  values  of  probability 
density  p(sj/Hj )  associated  respectively  with  the  J  local 
measurements  sj  constitute  the  inputs  for  the  processes 
discussed  hereafter. 

If  we  consider  the  most  common  case,  where  the 
measurements  sj  can  be  assumed  to  be  statistically 
independent,  since  the  sensors  are  generally  chosen  fbr 
the  complementary  nature  of  the  data  they  generate,  the 
likelihood  of  each  hypothesis  Hj  can  be  established 
immediately  by  the  Bayesian  approach,  which  typically 
calls  for  an  evaluation  of  the  a  posteriori  probability 
P{Hjls\,...,s j)  of  each  hypothesis  Hj  using  : 

P(Hj/s\,...,sj)  = 

{[II p(sj/Hi)]*P(Hi)}  / 1  {[TI p(sj/Hk)]*P(Hk)}  (2-10) 
j  k  J 

in  which  P(Hj)  designates  its  a  priori  probability. 

However,  this  kind  of  approach  quickly  runs  into 
difficulty  when  the  real  observation  conditions  differ  from 
the  available  learning  conditions,  or  when  the 
measurement  bank  is  not  sufficient  for  a  suitable  learning 
process.  The  lack  of  control  that  can  be  seen  at  this  level 
in  most  applications  leads  to  distribution  models  that 
turn  out  to  be  more  or  less  representative  of  the  data 
actually  encountered.  In  addition,  it  is  often  difficult  to 
find  a  set  of  a  priori  probabilities  P(Hj)  capable  cf 
reflecting  the  real  situation  with  fidelity. 

So  we  want  to  find  a  modeling  based  solely  on  the 
knowledge  of  p(sj/Hf)  and  capable  of  integrating  any 
information  concerning  the  reliability  of  the  various 


distributions,  whether  this  comes  from  a  more  or  less 
partial  knowledge  of  the  observations  conditions  or  from 
a  qualification  of  a  data  bank. 

According  to  the  generic  approach  introduced  in  section 
2.1,  any  available  qualitative  information  is  assumed  to 
be  synthesized  in  the  form  of  1*J  coefficients  qy<=  [0,1], 
each  being  representative  of  a  degree  of  confidence  in  the 
knowledge  of  each  of  the  I*J  distributions  p(sj/H{). 

Dealing  with  this  problem  in  the  terms  of  evidence 
theory  requires  finding,  for  each  source  Sj,  a  model  of  its 
7  a  priori  probabilities  p(sj/Hj)  and  their  7  respective 
confidence  factors  qy  in  the  form  of  a  mass  function 
mj(.),  characterized  by  a  credibility  function  C/y(.),  and 
by  a  plausibility  function  PljQ .  Since  the  sources  Sj  are 
distinct,  a  global  evaluation  m(.)  can  then  be  obtained  by 
computation  of  the  orthogonal  sum  of  the  mjQ.  The 
appropriate  frame  of  discernment  is  of  course  the  set  of 
the  la  priori  listed  hypotheses  77/. 

To  do  this,  we  propose  to  conduct  an  exhaustive  and 
exact  search  of  all  the  models  that  might  satisfy  three 
fundamental  axioms  [2],  [5],  These  three  axioms  are 
chosen  beforehand  on  the  basis  of  their  legitimacy  in 
most  of  the  applications  concerned.  They  are  : 

Axiom  3  :  Consistency  with  the  Bayesian  approach  in 
the  case  where  the  learned  distributions  p{sjlHj)  are 
perfectly  representative  of  the  densities  actually 
encountered  (qy=  1,  \/ij)  and  where  the  a  priori 
probabilities  P(Hj)  are  known. 

Axiom  4  :  Separability  of  the  evaluation  of  the 
hypotheses  77/ ;  that  is,  each  probability  must  be 
considered  as  a  distinct  source  of  information  generating 
a  particular  mass  function  my(.),  mainly  capable  cf 
integrating  the  confidence  factor  qy  specific  to  it.  We 
thus  require  that  each  mass  function  my(.)  be  the 
orthogonal  sum  of  the  7  mass  functions  my(.)  considered 
for  7e  [1,7].  Also,  considering  the  way  the  p(sj/H, ) 
probabilities  are  generated,  the  focal  elements  of  the  mass 
function  my(.)  can  be  only  Hi,  -'Hi,  or  E ,  where  the 
frame  of  discernment  E  is  the  set  of  hypotheses  77/. 

Axiom  5  :  Consistency  with  the  probabilistic  association 
of  the  sources  ;  for  independent  sources  Sj  and  densities 
p(sj/Hj)  perfectly  representative  of  reality,  the  modeling 
procedures  retained  must  lead  to  the  same  result  if  we 
compute  the  orthogonal  sum  of  the  mj(.)  modeled  from 
the  p(sj/Hj)  or  if  we  model  directly  the  joint 
probabilities  p{s\,...,sj!Hj)  given  by  : 

p(s\,...,sj/Hi)  =  Up(sj/Hi)  (2.11) 

j 

The  search  for  models  satisfying  these  three  axioms  is 
presented  in  appendix  B  by  progressively  restricting  the 
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set  of  possible  models,  taking  the  axioms  into  account 
in  the  order  stated. 

It  appears  that  only  two  models  satisfy  the  three  desired 
axioms.  Both  meet  the  decomposition  : 


mj(.)=  ©  my{.)  (2.12) 

i 

Model  1  is  particularized  by  : 

my(Hj)  =  0  (2.13) 

mi/rHj)  =  qjf{\-Rfp(sj/Hi)}  (2.14) 

my(E)  =  1  -qij+qifRfp(sjIHj)  (2. 1 5) 

and  Model  2  by  : 

mijiHi)  =  q  ij*Rj*p(sj/ ///)/  { 1  +Rfp(Sj/Hi)}  (2. 16) 

mij^Hi)  =  qj/{\+Rfp(sj/Hi)}  (2.17) 

mij(E)  —  l-qy  (2.18) 

In  both  cases,  the  normalization  factor  Rj  is  simply 
constrained  by  : 

Rj  e  [0,  (max {p(sjl Hf) } )'  1  ]  (2.19) 

Sj,i 


Nevertheless,  the  specificity  of  the  function  used  to 
generate  model  2  allows  Rj  to  be  simply  a  positive 
number  for  this  model  in  practice. 

We  may  verify  that  this  result  is  in  fact  a  special  case  of 
the  generic  solution  discussed  in  section  2.1.  Both 
models  provided  by  (2.3)  to  (2.5)  and  (2.6)  to  (2.8)  in 
section  2.1  are  strictly  equivalent  to  the  two  models 
found  here  in  (2.13)  to  (2.15)  and  in  (2.16)  to  (2.18),  if 
we  adopt  the  following  respective  definitions  for  the  Cy  : 

for  model  1  :  Cy  =  Rj*p(sj/Hi )  (2.20) 

for  model  2  :  Cjj  =  Rj*p(sj/Hi)/[\+Rj*p(sj/Hj)]  (2.21) 

in  which  Rj  is  still,  of  course,  the  normalization  gain 
constrained  by  (2.19). 

This  outcome  is  in  fact  legitimate  if  we  note  that  Axiom 
1  is  expressed  directly  by  Axiom  4,  and  that  the 
solutions  required  by  Axioms  3  and  5  automatically 
verify  Axiom  2.  Axioms  3  and  5  simply  make  it 
possible  to  specify  the  inclusion  of  the  particular 
information  p(sj/Hf)  in  the  expression  of  the  criterion 


Lastly,  when  the  data  sj  are  discrete  values  (local 
identity  declarations,  for  example),  the  generalized  Bayes 
theorem  defined  by  P.  SMETS  in  the  framework  of 
evidence  theory  [6]  can  be  applied,  for  the  case  of 
statistical  learning,  to  the  cartesian  product  between  the 
set  of  data  and  the  set  of  hypotheses.  It  then  strictly 
yields  Model  1  developed  here.  Correlatively,  we  have 


to  note  that  the  model  1,  once  again,  minimizes  the 
specificity  measure  criterion  [4], 

2.2.2  Uncertain  Observation 

We  assume  now  that  the  measurements  sj  are  uncertain 
observations  of  discrimination  features  uj,  so  that  only 
their  a  priori  probability  distribution  p(sj/uj)  is  known. 
This  may  simply  be  the  classical  characterization  of  a 
measurement  error.  Furthermore,  for  each  of  the 
discrimination  features  uj  themselves,  a  learning  of  their 
a  priori  probability  distribution  p(iij/Hj)  under  the 
various  hypotheses  Hi  is  as  formerly  available. 

If  we  note  that : 

p(sj/Hi )  =  J p(sj/uj)*p(uj/Hj )  d uj  (2.22) 

then  the  results  of  section  2.2.1  can  be  directly  extended 
so  that  the  generic  solution  provided  in  section  2.1 
holds  with  : 

for  model  1  :  Cy  =  Rj*  j p(sj/uj)*p(uj/H{)  d uj  (2.23) 

for  model  2  :  Cy  =  Rj*[  j  p(sj/uj)*p(uj/Hj)  d uj\! 

[1+jRy*  J p(sj/uj)*p(uj/Hj)  dz ij\  (2.24) 

where  the  normalization  factor  Rj  is  now  constrained  by  : 

Rj  e  [0,  (max{  S p(sj/uj)*p(uj/Hj)  duj})-l]  (2.25) 

Sj,i 

2.2.3  Imprecise  Observation 

In  this  case  the  measurements  sj  are  imprecise 
observations  of  the  discrimination  features  uj,  so  that 
they  only  provide  a  fuzzy  membership  function  lij(uj). 
For  each  of  the  discrimination  features  uj ,  a  learning  of 
their  a  priori  probability  distribution  p(uj!Hj)  under  the 
various  hypotheses  H\  remains  available,  as  formerly. 

We  can  here  express  : 

p(sj/Hf)  =  J  /Jj(uj)*p(uj/Hj)  d uj  (2 .26) 

Once  more  the  results  of  section  2.2.1  can  be  directly 
extended,  and  the  generic  solution  provided  in  section 
2.1  holds  with  : 

for  model  1  :  Cy  -  Rj*  1  fij(uj)*p(uj/Hi)  duj( 2.27) 

for  model  2  :  Cy  =  7?y*[  I  /Jj(uj)*p(uj/Hi)  d uj]/ 

[l+Rj*  i Pj(.uj)*p(uj/Hi)  duj]  (2.28) 

The  normalization  factor  Rj  is  now  constrained  by  : 
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Rj  e  [0,(max{\  Hj(uj)*p(uj/Hj)duj})-1]  (2.29) 

Sj,i 

2.3  Model  With  Approximate  Prior  Knowledge 

Once  again  we  have  to  consider  the  different  kinds  of 
relation  between  the  sensor  observations  and  the 
discrimination  features,  whereas  the  latter  are  now 
characterized  by  approximate  prior  knowledge. 

2.3.1  Precise  and  Reliable  Observation 

In  this  case  each  of  the  measurements  sj  is  directly  one  of 
the  discrimination  features  exploited,  or  a  deterministic 
function  of  it.  Nevertheless,  the  characterization  of  the 
different  hypotheses  Hi  in  the  feature  space  is  now 
provided  by  prior  knowledge  in  the  form  of  fuzzy 
membership  functions  Hi(sj).  This  means  that  an 
hypothesis  Hi  represents  for  instance  an  object  about 
which  we  only  know  that  it  is  large,  slow,  or  heavy,  as 
regards  respectively  the  size,  speed,  or  weight  space. 

To  elaborate  a  suitable  model,  we  have  first  to  consider, 
for  each  hypothesis  ///,  a-cuts  Aak  of  fifej)  at  different 
decreasing  levels  ccfc.  As  each  a-cut  defines  a  set  that 
includes  the  previous  one,  it  leads  to  the  following 
consonant  mass  function  on  the  measurement  space  : 

m{Aa.klHi)  =  c%  -  ak+l  (2-30) 

The  Generalized  Bayes  Theorem  defined  by  P.  SMETS 
in  the  framework  of  the  evidence  theory  [6]  may  then  be 
implemented  to  obtain  a  mass  function  on  E  for  each  cf 
the  measurements  sj : 

m(Hi/sj)  =  0  (2.31) 

mf^Hj/sj)  =  l-nfej)  (2.32) 

m(E/sj)  =  fXj(sj)  (2-33) 

This  mass  function  is  then  discounted  at  the  rate  (1  '?//)> 
if  qij  represents  our  degree  of  confidence  in  the  prior 
knowledge  to  provide  the  model  we  are  looking 

for: 

mij(Hi)  =  0  (2-34) 

mij{~"Hi)  ~  <Jij*  [  1  "WC y)]  (2-3  5) 

mij(E)  =  \-qij+qij*Hi(sj)  (2.36) 

Obviously  this  result  can  be  expressed  by  the  model  1 
obtained  for  the  generic  problem  in  section  2.1,  as  soon 
as  Cij  is  defined  by  : 


2.3.2  Uncertain  Observation 

We  assume  here  that  the  measurements  sj  are  uncertain 
observations  of  discrimination  features  uj,  so  that  only 
their  a  priori  probability  distribution  p(sj/uj )  is  known. 
Furthermore,  for  each  of  the  discrimination  features  uj 
themselves,  a  prior  knowledge  provides  the  fizzy 

membership  function  wfuj)  that  characterizes  the  different 
hypotheses  Hh  as  formerly. 

On  this  basis,  the  possibility  theory  gives  the 

possibility  of  each  hypothesis  Hi  for  each  feature  uj  : 

n(Hiluj)  =  pi(uj)  (2.38) 

as  well  as  the  possibility  density  of  each  feature  uj  fix- 
each  measurement  sj : 

IRuj/sj)  =  Rfp(sj/uj )  (2.39) 

where  Rj  is  a  normalization  factor  such  that : 

Rj  =  [max{p(sj/uj)}]-]  (2.40) 

uj 

As  a  possibility  function  is  simply  a  consonant 
plausibility  function,  on  the  one  hand  (2.38)  can  in  fact 
be  deduced  from  (2.31),  (2.32),  and  (2.33),  reminding 
that  in  the  latter  sj  is  directly  the  feature  uj,  and  on  the 
other  hand  (2.39)  can  result  from  (2.12),  (2.13),  (2.14), 
and  (2.15),  when  information  is  considered  reliable 
(qij=  1),  if  the  set  E  of  hypotheses  becomes  the  infinite 
set  of  the  possible  values  taken  by  uj  (note  that  such  an 
evaluation  in  a  continuous  framework  is  only  possible 
for  the  plausibility  function,  and  thanks  to  the  special 
nature  of  the  focal  elements  of  m,y(.)  as  defined  by  (2.13), 
(2.14),  and  (2.15)).  The  normalization  factor  Rj  has 
simply  to  be  particularized  by  (2.40),  in  agreement  with 
(2.19),  in  order  to  satisfy  the  definition  of  a  possibility 
distribution. 

Then,  (2.38)  and  (2.39)  allow  to  elaborate  the 
possibility  of  each  hypothesis  ///  for  each  measurement 

sj  thanks  to  : 

nwsj)  =  sup  {J~[(H uUj! sj)}  (2.41) 

uj 

which  can  be  developed  as : 

IKHj/sj)  =  sup  {ri(Hi/uj)An(uj/sj)}  (2.42) 

uj 

where  a  stands  for  the  conjunction  operator,  and  finally  : 
IKHi/sj)  =  s\xp{mm{fii(uj),Rj*p(sj/uj)}  }  (2.43) 

uj 


Cij  ~  Hi(sj) 


(2.37) 


5-6 


As  a  possibility  function  is  but  a  particular  plausibility 
function,  the  corresponding  mass  function  which 
minimizes  the  specificity  measure  [4]  can  be  expressed, 
after  discounting  according  to  the  confidence  factor  qy : 

=  0  (2.44) 

myirHi)  =  qij*[  1-  sup  {min  {Uj(uj),Rj*p(sj/uj)} }  ] 

uy  (2.45) 

mijiE)  =  1  -qy+qjj*sup  {min{jij(uj),Rj*p(sj/  uj)} } 

uj  (2.46) 

Once  more,  this  result  is  obviously  the  model  1 
provided  for  the  generic  problem  in  section  2.1,  as  soon 
as  Cjj  is  now  defined  by  : 

Cy  =  sup  {min{lii(uj),Rfp(sj/uj)  } }  (2.47) 

uj 

in  which  Rj  is  still  given  by  (2.40). 


2.3.3  Imprecise  Observation 

The  measurements  sj  are  now  imprecise  observations  of 
the  discrimination  features  uj,  so  that  they  only  provide 
a  fuzzy  membership  function  Nevertheless,  for 

each  of  the  discrimination  features  uj  themselves,  the 
prior  knowledge  still  provides  the  fuzzy  membership 
function  flj(uj)  that  characterizes  the  different  hypotheses 
Hi,  as  formerly. 

The  developments  in  these  conditions  are  quite  similar 
to  those  shown  in  section  2.3.2.  The  only  difference 
concerns  the  expression  of  the  possibility  density  of  each 
feature  uj  for  each  measurement  sj  that  becomes  : 


TKuj/sj)  =  )  (2.48) 

Consequently,  this  leads  once  again  to  the  model  1 
provided  for  the  generic  problem  in  section  2.1,  where 
Cy  has  now  to  be  defined  by  : 

Cjj  =  sup{min{jUj(»y),jU^Hy)} }  (2.49) 

uj 

2.4  Summary  of  the  Models  Obtained 

A  complete  set  of  models  has  been  developed  in  the 
previous  sections,  in  the  framework  of  the  theory  of 
evidence,  according  to  the  different  kinds  of  data  that 
have  to  be  combined  as  regards  measurements  on  the  one 
hand,  and  prior  knowledge  on  the  other  hand.  All  these 
models  are  particular  cases  of  the  two  models  provided 
for  the  generic  problem  in  section  2.1,  thanks  to  suitable 
expressions  of  Cy.  The  definitions  of  Cy  for  the  different 
possible  situations  are  summarized  in  table  1  for 
model  1  and  in  table  2  for  model  2. 

We  may  note  that  tables  1  and  2  furthermore  provide 
expressions  of  Cy  for  a  precise  and  reliable  prior 
knowledge,  i.e.  for  a  prior  characterization  of  hypothesis 
Hi  formalized  by  a  deterministic  value  uy  of  the  feature 
uj.  In  fact  these  expressions  are  simply  provided  either 
by  statistical  learning  when  p(uj/Hj)=S(uj-ujj),  or  by 
approximate  prior  knowledge  when  Pj{uy)=  1  and 
/ii(uj/uj£uy)=0.  Nevertheless  we  can  verify  the  good 
coherence  at  this  level  between  the  two  approaches  in  the 
common  case  of  model  1 . 


Prior  knowledge  — > 
Measurements  i 

uy 

p(uj/Hi) 

Pi(uj) 

sj 

1  if  i sj^Ujj 

0  if  sffriij 

Rj*p(sj/Hi) 

Rj  constrained  by  (2.19) 

Pi(sj) 

P(sj/uj ) 

Rj*p(sj/uy ) 

Rj  constrained 
by  (2.19) 

Rj*  J  p(sj/uj)*p(uj/Hj)  duj 

Rj  constrained  by  (2.25) 

sup  {min  {jij{uj),  Rj*p(sj/uj)}} 

Uj 

Rj  defined  by  (2.40) 

Pj(uij ) 

Rj*  J  /lj(uj)*p(itj/Hj)  d uj 

Rj  constrained  by  (2.29) 

sup{min {fifaij),  Pjiuj)}} 

_ _ _ 

Table  1.  Expression  of  Cy  in  generic  model  1  for  the  different  kinds  of  prior  knowledge  and  measurement 
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Prior  knowledge  — > 
Measurements  1 

UU 

piuj/Hf 

Pi(uj) 

*j 

0.5  if  sj=ujj 

0  if  sfftiij 

Rj*p(sj/Hi)/[l+Rfp(sjlHi)] 

Rj>  0 

(*) 

p(Sj/uj) 

Rj*P(sj/uij)/[  1  +Rj*p(sj/uij)] 
Rj>0 

Rj*[  J  p(.sj/uj)*p(uj/Hi )  d uj]l 
[l+Rj*  \ p(sj/uj)*p(uj/Hj)  d uj\ 
Rj>  0 

(*) 

Pj(uj) 

flj(uij)/[l+Pj(uij)] 

Rf{)pj{ujYp{ujIHi)duj}l 
[l+Rj*  \  fij(uj)*p(uj/Hj)  d uj] 

R/>  0 

(*) 

(*)  Not  consistent 


Table  2.  Expression  of  C,y  in  generic  model  2  for  the  different  kinds  of  prior  knowledge  and  measurement 


3  Target  Classification 

The  target  classification  function  consists  in  recognizing 
the  type  of  a  target,  or  even  identifying  it,  on  the  basis  cf 
the  different  discriminating  features  sj  delivered  by  the 
sensors  Sj  that  observe  it.  So  the  question  is  to 
designate  the  most  likely  hypothesis  Hf  in 
E={H\,...,  Hj}  having  regard  to  this  information.  Such 
a  decision,  which  is  immediate  when  a  probability  can 
be  associated  a  posteriori  with  each  hypothesis, 
becomes  quite  delicate  when  the  evaluations  are 
presented  in  terms  of  mass  functions  of  the  evidence 
theory.  The  whole  difficulty  revolves  around  the  non¬ 
exclusivity  of  the  evaluations,  which  raises  the  practical 
problem  of  interpretation  and  relative  inclusion  of  the 
masses  attached  to  those  focal  elements  of  cardinal  2  or 
greater,  in  the  designation  of  a  unique  singleton.  This 
problem,  which  is  general  to  the  evidence  theory  and 
unavoidable  in  the  present  context,  has,  as  of  today,  been 
addressed  only  by  more  or  less  satisfactory  intuitive 
solutions. 

So  below,  we  propose  three  different  approaches  to  the 
problem  of  choosing  the  most  likely  hypothesis  Hf, 
considering  an  arbitrary  mass  function  zn(.)  on  the  frame 
of  discernment  E={H\,...,  Hj },  when  no  other  a  priori 
basis  for  discriminating  among  the  H\  is  retained. 

A  synthesis  of  the  resulting  procedures  provides  a 
decision  law  suited  to  the  classification  problem.  When 
applied  to  the  modeled  mass  functions,  this  law  supplies 
classification  methods  of  noteworthy  interest. 

3.1  Minimum  of  Inconsistency 

This  approach  consists  in  defining  I  certain  mass 
functions '/«/(.),  each  of  them  being  respectively  focused 
on  each  of  the  1  hypotheses  Hj  of  the  frame  cf 
discernment  E  (»?/(///)=  1).  The  inconsistency  Kj, 
provided  by  the  orthogonal  sum  of  the  mass  function 
mf.)  and  the  available  mass  function  m(.),  reflects  their 
disagreement,  and  so  represents  the  conflict  between  the 
assessment  m{.)  and  the  fact  that  hypothesis  Hi  is 


actually  true.  According  to  this,  we  have  to  choose  the 
hypothesis  Hf  that  ensures  a  minimal  inconsistency  Kp 
As  Kf  can  be  written  : 

Ki=l-Pl(Hi)  (3.1) 

in  which  Pl(.)  is  the  plausibility  function  associated  with 
m(.),  we  have  to  choose  the  hypothesis  that  provides  a 
maximal  plausibility. 

The  interest  in  this  inconsistency  criterion  is  confirmed 
by  the  idea  of  entropy  that  is  connected  with  it  [4], 


3.2  «  Bayesian  »  Approach 

The  idea  here  is  to  consider  a  given  set  of 
«  equiprobable  »  Bayesian  masses  /«o(  )  on  the  frame  cf 
discernment  E  (mo(Hi)-HI,  V/e  [1,/]).  Endowing  this 
mass  function  m§(.)  with  a  role  similar  to  that  cf 
equiprobable  a  priori  probabilities  in  the  Bayesian 
inference,  a  mass  function  mc(.)  can  be  determined  by 
orthogonal  sum  of  the  mass  function  mo(.)  and  the 
available  mass  function  m(.).  mc{)  is  then  a  Bayesian 
mass  function  defined  by  : 

mc(Hj)  =  Pl(Hi)/{  I  Pl(Hk)  }  ,  ze[  \J]  (3.2) 

[1,71 

mc(A)  =  0,  V  A^Hj  ,  ie  [1,7]  (3.3) 

in  which  Pl(.)  is  the  plausibility  function  associated  with 
m(.).  By  reference  to  the  maximum  a  posteriori 
probability,  the  decision  procedure  obviously  consists  in 
retaining  the  hypothesis  Hf  that  has  the  maximum 
mass,  and  thus  the  maximum  plausibility  here  again. 

Conceptually,  the  principle  of  this  approach  consists  in 
substituting  an  equal  confidence  between  the  singletons 
of  the  frame  of  discernment  in  place  of  the  total  a  priori 
uncertainty,  so  as  to  force  the  discrimination  among 
these  elements  alone. 
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3.3  General  Approach  of  the  Decision  Problem 

Here  we  look  for  a  solution  with  reference  to  a  more 
general  decisional  context  summarized,  for  example,  in 
[7].  The  purpose  is  to  choose  one  of  a  number  of 
possible  actions  to  take,  ah,  on  the  basis  of  the 
evaluation  provided  by  the  mass  set  m(.)  on  the  frame  of 
discernment  E. 

This  choice  can  be  made  by  maximizing  a  cost  function 
C(afj)  on  the  set  of  possible  actions,  knowing  the  weight 
G(a}yBk)  assigned  to  each  potential  action  ah  when  the 
event  Bk,  a  subset  of  E,  occurs  : 

C(ah)  =  I  {G(ah/Bkrm(Bk)}  (3.4) 

Bk^E 

The  whole  difficulty  of  using  such  a  procedure  in 
practice,  and  hence  its  credibility,  resides  in  the 
evaluation  of  the  weights  G(ahlBk),  which  is  usually 
very  subjective.  While  we  may  in  general  consider  that 
the  weights  relative  to  the  singletons  77/  of  E  are  given 
by  the  system  or  user,  those  relative  to  the  subsets  Bk  of 
cardinal  2  or  higher  must,  on  the  other  hand,  be 
determined  intuitively,  possibly  in  accordance  with  a 
preferred  «  attitude  »  [7]. 

Yet  in  our  case,  this  subjective  character  can  be  greatly 
attenuated  by  the  one-to-one  correspondence  we  have  to 
establish  between  the  set  of  actions  and  the  frame  cf 
discernment  E,  as  each  action  ah  consists  in  declaring  an 
hypothesis  77/  to  be  true.  So,  if  there  are  no  specific 
requirements,  the  weights  are  legitimately  given  by  : 

G(ai/Bk)  =  1  if  HteBk  (3-5) 

G(aj/Bk)  =  0  if  HfiBk  (3-6) 

so  as  to  conform  with  the  associated  idea  of  mass  m(Bk) 
as  introduced  by  the  evidence  theory,  i.e.  as  an 
evaluation  of  one  of  the  elements  of  Bk,  though  we 
cannot  specify  of  which  element  of  Bk  it  is. 

Under  these  conditions,  (3.4)  also  leads  to  the 
designation  of  hypothesis  77/*  of  maximum  plausibility 
as  the  most  likely. 

Furthermore,  it  can  be  pointed  out  that  this  approach  is 
coherent  with  the  most  consensual  evaluation  of  the 
expected  loss,  among  those  proposed  for  instance  in  [8]. 

3.4  Synthesis 

The  three  approaches  presented  all  converge  to  the  same 
decisional  procedure,  which  consists  in  choosing  the 
most  likely  hypothesis  77/*  according  to  : 


It  should  also  be  noted  that  this  decision  law  is  the  one 
that  satisfies  the  constraint  emphasized  at  the  end  of  the 
developments  concerning  axiom  3  in  appendix  B. 

For  all  the  models  discussed  in  section  2,  as  the 
hypotheses  Hj  are  singletons  of  the  frame  of  discernment 
E,  the  plausibility  77(77/ )  is  proportional  to  the  product 
of  the  Plkj(Hj)  associated  with  the  mkjt),  according  to  j 
and  k.  After  normalization  by  the  product  of  the 
Plkji^Hk)  according  to  j  and  k,  we  come  to  designate 
Hf  by  the  criterion  : 

Hi*  =  arg[max{U([mij(Hi)+miJ(E)]/[mij(-'Hl)+mij(E)])}] 
i  j  (3.8) 

This  criterion  can  directly  be  applied  to  the  two  models 
provided  for  the  generic  problem,  leading  to  the  two 
respective  solutions  : 

Solution  1  : 

Hi*  =  arg[max{ri[l -■?;/(  1-Cyj]}]  (3.9) 

i  j 

Solution  2  : 

Hf  =  argjmax  {FI([  1  *(  1  -C '//)]/[  1  -q  y*Cy])}  ]  (3.10) 

i  j 

It  should  be  noted  that  solution  1  also  meets  a  maximum 
credibility  criterion. 

The  simplicity  of  the  calculations  and  ease  of  use  of 
these  solutions  is  worth  noting. 

Furthermore,  if  we  want  to  integrate  a  relative  a  priori 
confidence  A/e  [0,1]  respectively  in  the  declaration  of 
each  of  the  different  hypotheses  77/,  or  accordingly  an 
expected  risk  (1-A/)  attached  to  it,  we  may  formalize  this 
knowledge  as  a  supplementary  source  of  information. 
The  plausibility  function  Plot)  °f  such  a  subjective 
source  has  simply  to  be  defined  on  E  for  the  singletons 
77/,  the  only  information  we  need  in  the  following, 
thanks  to  direct  interpretation  of  the  A/ : 

Pl0(Hi)  =  A/  (3.11) 

Therefore,  the  criterion  of  maximum  plausibility  (3.7) 
becomes,  once  the  orthogonal  sum  between  77(.)  and 
PIq(.)  is  computed  for  the  singletons  77/ : 

77/*  =  arg[max  {A/*77(77/)}]  (3.12) 

ie  [1,7] 

and  consequently  (3.9)  and  (3.10)  become  : 


77/*  =  arg[max  {77(77/)}] 
/e  [1,7] 


(3.7) 


5-9 


Solution  1  : 

Hf  =  arg  [max  { A/*  IT  [  1  -i?/)*  ( 1  -Cy )] }  ]  (3.13) 

i  j 

Solution  2  : 

Hi*  =  arg[max  { AZ*II([  1  -qij*  ( 1  -Cij)]/[  1  -qij*  Cy])}] 

i  j  (3-14) 

Of  course,  in  (3.9),  (3.10),  (3.13),  and  (3.14),  Cy  stands 
for  any  of  the  suitable  expressions  developed  in 
section  2,  according  to  the  nature  of  the  available  data. 


3.5  Illustration 

Figure  1  shows  the  mean  probability  of  good  recognition 
provided  by  the  simulation  of  2  sensors  for  a  problem  of 
discrimination  between  2  hypotheses  H\  and  H2.  In  this 
very  simple  example  both  sensors  are  similar,  as  regards 
either  their  a  priori  good  discrimination  capability,  or 
the  unreliability  of  their  learning  concerning  hypothesis 
H2,  in  relation  with  an  anticipated  possible  evolution  of 
the  context. 


Recognition  rate 


Figure  1.  Classification  with  unreliable  learning 


More  precisely,  we  are  in  the  situation  developed  in 
section  2.2.1,  and  available  learnings  are  given  by 
normal  distributions : 

P(s\/Hi)  =  P(s2/H\)  =  N(0,\)  ,  within  =  ?12=  1 
P(s\/H2)  =  P(s2/H2)  =  N(6, 1)  ,  with  721  =  722  =  7 

while  measurements  actually  simulated  fellow  : 

P(s\/H\)  =  P(s2/H\)  =  ^(0,1) 

P(s\/H2)  =  N(S,\)  ,  P(s2/H2)-N(2,l) 

So  in  this  test  sensor  2  has  effectively  a  wrong 
knowledge  about  H2,  and  the  reliability  of  sensor  1 
varies  in  function  of  the  signal  S  due  to  H2.  This  is  in 
accordance  with  the  choice  of  factors  <7  jj  that  expresses  a 
situation  where  a  severe  error  concerning  H2  may  occur 
simultaneously  on  both  sensors.  In  this  context  our 
attention  has  to  focus  on  the  values  of  S  much  lower 
than  6,  i.e.  typically  S<4.  Then  the  curves  of  figure  1 
emphasize  the  robustness  of  our  approach  (7=0,9),  as 
regards  either  the  probabilistic  approach,  that  is  a 


particular  case  of  our  method  (7=1),  or  each  sensor  alone, 
which  the  probabilistic  approach  does  not  achieve. 

Moreover,  the  aptitude  of  the  qy  factors  to  integrate 
linguistic  or  subjective  information,  considering  the  low 
sensitivity  of  the  results  to  the  choice  of  a  given  value  for 
these  coefficients,  must  be  pointed  out  [5]. 

4  Management  of  Heterogeneous  Frames  of 
Discernment 

Most  often  we  have  to  use  a  sensor  Si  that  provides  a 
mass  function  m\(.)  on  a  frame  of  discernment  E\,  but  in 
fact  we  need  an  assessment  on  a  frame  of  discernment  E 
such  that  E\  C£,  either  for  immediate  decision  making, 
or  for  combination  with  other  mass  functions  available 
on  complementary  frames  of  discernment.  For  instance 
this  may  occur  in  case  of  incomplete  learning,  or  when 
some  hypotheses  are  not  observable,  as  regards  sensor 
Si.  So  we  have  to  express  «ii(.)  on  E. 
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The  basic  solution  to  that  problem  consists  in 
deconditioning  m\(.)  from  E\  to  E  (see  appendix  A). 
The  advantage  of  this  solution  is  that  it  does  not  require 
any  particular  condition,  and  therefore  can  be  always 
implemented.  Nevertheless,  as  we  shall  see,  in  most 
cases  it  is  not  the  best  approach. 

A  first  alternative  solution  is  possible  when  «](.)  is 
obtained  from  modeling  developed  in  section  2,  i.e.  it  is 
the  orthogonal  sum,  according  to  i,  of  the  mass  functions 
ffz/i(.)  defined  by  (2.3),  (2.4),  and  (2.5),  or  by  (2.6), 
(2.7)  and  (2.8).  Then,  the  absence  of  any  C/i, 
characterized  by  <7/1=0,  does  amount  to  ignoring  the 
corresponding  mass  function  m/i(.),  as  it  becomes  a 
trivial  mass  function  (tm/i(E)=1),  and  is  therefore  a 
neutral  element  of  the  orthogonal  sum.  Moreover,  the 
other  elementary  models  m/i(.)  are  not  modified 
whatever  the  frame  of  discernment  is,  as  their  focal 
elements  integrate  in  ->#/  any  hypothesis  that  is  different 
from  Hj,  so  that  -'Hj  may  naturally  include  all  the 
missing  hypotheses.  Compared  with  the  previous 
method  of  deconditioning  at  the  level  of  m  i(.),  this 
approach  ensures  less  losses  of  specificity  measure  [4]  as 
regards  initial  information. 

If  m\(.)  is  directly  used  in  a  decision  process,  we  have  to 
point  out  that  both  approaches  provide  a  maximal 
plausibility  for  missing  hypotheses,  as  the  latter  have 
always  a  plausibility  equal  to  1.  Decision  rule  (3.12)  is 
then  more  suitable  than  decision  rule  (3.7),  as  A/  can 
integrate  the  disparity  in  quantity  of  information  that  is 
available  for  each  hypothesis. 

Another  kind  of  approach  is  presented  hereafter,  that  is 
particular  to  situations  requiring  the  orthogonal  sum  cf 
several  sources  Sj  defined  on  non  disjoint  respective 

frames  of  discernment  Ej  such  that  UEj=E. 


4.1  Plausibility  Correction  Method 

This  method  (see  for  instance  [9])  realizes  a  global 
treatment  on  the  available  sources,  dealing  with  unlike 
frames  of  discernment  and  source  combination  together 
in  a  same  processing.  Nevertheless,  to  implement  it,  the 
sources  to  combine  must  be  defined  on  frames  directly  or 
indirectly  connected.  It  means  that  for  a  given  source, 
there  must  exist  at  least  another  one  such  that  their 
respective  frames  are  not  disjoint,  thus  having  a  common 
part. 

Furthermore,  this  method  is  based  on  the  use  of  some 
properties  that  the  plausibility  measure  only  verifies. 
Due  to  one  of  these  properties,  the  method  will  be  only 
able  to  deduce  the  plausibility  of  each  hypothesis.  In 
fact,  there  exists  an  infinity  of  mass  functions  that 
corresponds  to  this  set  of  plausibilities.  In  particular,  we 
can  find  the  one  that  corresponds  to  the  application  of  the 
minimum  specificity  criterion.  Nevertheless,  the  relevant 
information  remains  based  only  on  these  few 


plausibilities.  Moreover,  the  maximum  of  plausibility  is 
the  decision  criterion  that  is  most  often  used  in  this 
theory,  as  justified  in  section  3.  So,  considering  this 
criterion  as  the  most  suitable  in  accordance  with  the 
context  introduced,  the  plausibility  of  each  hypothesis  is 
the  only  information  that  has  to  be  expressed  in  the 
following. 

Let  Si  and  Sj  be  two  sources  respectively  defined  on  E\ 
and  Ej,  such  that  E=E\ UE2  and  Ec=E\f\E2-  The 
plausibility  Pl(Hj)  of  each  hypothesis  Hi  after 
combination  of  both  sources  Si  and  S2  can  be  rigorously 
expressed  on  the  frame  £  as  a  function  of  the  information 
actually  available  for  each  source  in  relation  to  their 
respective  mass  functions  m\(.)  on  E\  and  mji-)  on  E2- 
After  suitable  simplifications  we  obtain  the  following 
formulation  of  the  relative  plausibility  of  each 
hypothesis  : 

Pl(Hi)  =  Pl\ (Hi)  ,  V///e (E\-Ec)  (4.1) 

Pl(Hi)  =  Pl\(Hi)*Pl2(HilEc)  ,  VII^Ec  (4.2) 
Pl(Hi)  =  Pl2(Hi)*Pl\(Ec)IPl2(Ec)  ,  VHie(E2-Ec)  (4.3) 

Because  of  the  simplifications,  the  expressions  provided 
are  only  proportional  to  plausibilities.  Nevertheless,  as 
the  method  is  used  in  relation  to  a  decision  criterion  cf 
maximum  of  plausibility,  the  proportionality  ratios  cf 
the  plausibilities  of  the  hypotheses  are  the  only  necessary 
information.  Furthermore,  symmetrical  expressions  can 
of  course  be  obtain  by  permutation  of  the  sources. 
Therefore,  the  decision  is  the  same  whatever  the  choice 
of  development. 

Intuitive  explanation  of  the  plausibility  correction 
method  may  be  the  following.  The  method  consists  in 
choosing  a  reference  source,  and  in  refining  and 
completing  its  knowledge  by  means  of  other  sources. 
The  reference  source  is  £1  in  the  present  development, 
but  this  choice  has  no  influence,  as  mentioned 
previously, 
o  • 

First,  plausibilities  of  the  hypotheses  considered  by  Si 
and  not  common  to  the  other  source  remain  unchanged, 
which  corresponds  to  the  expression  (4.1).  Indeed,  the 
other  source  gives  no  information  about  these 
hypotheses. 

The  second  step  consists  in  refining  the  knowledge  the 
reference  source  has  on  the  common  hypotheses,  thanks 
to  a  fusion  with  the  other  source  on  their  common  part 
Ec.  Indeed,  the  latter  represents  the  only  subset  on  which 
it  is  legitimate  to  implement  any  fusion.  This  step 
corresponds  to  the  expression  (4.2).  Obviously,  such  a 
step  can  exist  only  if  there  are  at  least  two  hypotheses 
contained  in  the  common  part  Ec. 

Lastly,  knowledge  of  the  reference  source  is  completed 
by  readjusting  the  plausibility  of  each  hypothesis  that  is 
considered  by  S2  and  not  by  £1.  So  each  of  them  is 
redefined  relatively  to  the  plausibilities  of  the  hypotheses 
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of  E\  with  respect  to  the  common  part  Ec  used  as  a 
pivot.  This  step  simply  consists  in  multiplying  each  cf 
the  plausibilities  peculiar  to  S2  by  a  unique  factor 
Pl\(Ec)/Pl2(Ec).  Such  a  factor  allows  to  preserve  the 
proportionality  ratio  between  the  plausibilities  of  all  cf 
the  hypotheses  considered  by  the  source  S2 ■  This 
operation  corresponds  to  the  expression  (4.3). 

Generalization  of  the  plausibility  correction  method  to 
more  than  two  sources  needs  to  determine  an  order  cf 
fusion,  because  this  method  is  based  on  a  non 
associative  operation.  The  best  approach  consists  in 
combining  always  sources  that  have  the  largest  common 
part.  With  such  an  order,  readjustments  are  based  on  the 
largest  pivot,  and  so  are  more  reliable.  Furthermore,  the 
maximum  of  hypotheses  are  involved  in  the  fusion  on 
common  part.  Nevertheless,  for  some  configurations  of 
sources,  this  order  can  still  lead  to  several  developments 
that  are  different  from  a  decision  point  of  view.  Then  it  is 
necessary  to  determine  the  order  that  leads  to  the  best 
performance,  thanks  to  suitable  criterion. 


4.2  Introduction  of  Compatibility  Relations 

All  the  approaches  that  have  been  introduced  can  be 
extended  in  order  to  integrate  further  knowledge  about 
some  similarity  that  may  exist  between  missing 
hypotheses  in  the  initial  frame  of  discernment  and  the 
hypotheses  that  are  considered  in  it,  as  regards  the 
features  that  are  processed. 

Such  a  similarity  can  be  described  by  a  compatibility 
relation  a>\  that  associates  to  a  considered  hypothesis  Hi 
of  E\  the  set  coi(Hj)  of  the  hypotheses  of  (£-£])  to 
which  Hi  is  similar,  and  that  satisfies  : 

co\(A)=  U  ©i(77/)'  (4.4) 

Hje  A 

where  co\  (A)  represents  the  set  of  missing  hypotheses  to 
which  the  considered  hypotheses  of  £1  contained  in  A 
are  similar.  Of  course  the  approach  imposes  that  eveiy 
missing  hypothesis  is  compatible  with  at  least  one  of  the 
hypotheses  considered. 

Deconditioning  method  can  then  be  modified  so  that  it 
consists  now  in  appending  (in  the  set  union  sense)  to 
each  focal  element  defined  on  E\  only  the  missing 
hypotheses  of  (£-£l)  with  which  this  focal  element  is 
compatible.  Doing  so  aims  at  considering  that  this 
element  contains  the  hypotheses  on  which  it  is  more 
plausible  that  the  missing  hypotheses  discussed  have 
transferred  their  evidence.  Thus,  the  deconditioned  mass 
function  becomes  : 

m(AUco\(A))  =  m](A)  ,  VAeN\  (4.5) 

m(A)  =  0  ,  V  AeE,  A&N\  (4.6) 

where  N\  is  the  set  of  the  focal  elements  of 


Besides,  the  approach  using  modeling  developed  in 
section  2  can  also  be  modified  according  to  the  same 
idea,  providing  on  £,  for  each  /7/e£  1  : 

Model  1  : 


mi\(HjJ(D\{Hfi)  -  0  (4.7) 

mn(-(HiUcoi(Hi)))  =  <7/l*(l-Ql)  (4.8) 

mi\(E)=\-qn*{\-Cn)  (4.9) 

Model  2  : 

mn(Hi\Jcoi(Hi))  =  qn*Cn  (4.10) 

mn(-(HiUcoi(Hi)))  =  qn*(\-Cn)  (4.11) 

mn(E)  =  \-qn  (4.12) 


Then  m\(l)  remains  simply  the  orthogonal  sum, 
according  to  i,  of  the  mass  functions  mu  (.)  defined  by 
(4.7),  (4.8),  and  (4.9),  or  by  (4.10),  (4.11)  and  (4.12). 

Concerning  the  plausibility  correction  method,  some 
terms  in  the  expression  of  the  plausibility  Pl(Hi)  obtain 
for  each  hypothesis  Hi  on  the  frame  £,  after  combination 
of  both  sources  Sj  and  S2,  increase  in  specificity  as  a 
source  may  now  bring  information  on  its  missing 
hypotheses  thanks  to  compatibility  relations.  After 
suitable  simplifications  the  relative  plausibility  of  each 
hypothesis  becomes  : 

Pl(Hj)  =  Pl\(Hi)*  Pl2(  U  Hk),  V///g(£i-£c) 
c»2(Hk)nHi*0  (4.13) 

Pl(Hj)  =  Pimr  PllWEc)  ,  V///e  Ec  (4. 14) 

Pl{Hi)  =  Pl2{Hi)*Pl\{  u  Hkypii(Ecypi2{Ec), 

(o\(Hk)nHpt0 

VHje  (E2-Ec)  (4.15) 

All  the  remarks  emphasized  in  section  4. 1  remain  true. 

4.3  Illustration 

The  deconditioning  approach  and  the  plausibility 
correction  methods  are  applied  on  two  sources  Si  and  S2 
that  are  respectively  defined  on  the  frames 

E\={Hi,H2M3)  and  £2={tf2>#3>#4}-  51  and  s2  ^ 
respectively  SAR  and  infrared  images  of  the  ground,  and 
the  four  classes  to  discriminate  are  :  field  (H\),  water 
(7/2),  forest  (£3),  building  (Ha,).  The  conditional 
probability  densities  of  the  features  observed  by  each 
source  with  respect  to  each  hypothesis  of  its  frame  are 
identified  on  actual  images  as  normal  distributions  that 
are  assumed  to  be  perfectly  representative.  Consequently 
mass  functions  are  built  according  to  modeling  provided 
in  section  2.2.1,  with  qy=  1  for  all  /  and  j. 
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Processing  of  available  images  provides  a  confusion 
matrix  for  each  source  alone  (tables  3  and  4),  and  for 
both  fusion  methods.  An  element  of  such  a  matrix  is  the 
declaration  rate  of  an  hypothesis  Hk  (column)  when  an 
hypothesis  Hi  is  presented  (row). 


Declaration  — » 

Presentation  -l 

H\ 

h2 

H3 

h4 

.899 

.690 

.300 

0 

h2 

.492 

.499 

.800 

0 

H3 

.526 

.750 

.398 

0 

ha 

.837 

.129 

.340 

0 

Table  3.  Confusion  matrix  of  source  1 


Declaration  — > 

Presentation  -l 

Hi 

h2 

H3 

Ha 

Hi 

0 

.712 

.320 

.254 

Hi 

0 

.925 

.160 

.590 

H3 

0 

.571 

.224 

.204 

ha 

0 

.108 

.300 

.889 

Table  4.  Confusion  matrix  of  source  2 


Figure  2  shows  the  recognition  rates  (diagonal  elements 
of  the  confusion  matrix)  obtained  for  both  fusion 
methods.  The  main  difference  is  the  advantage  of  the 
plausibility  correction  method  as  regards  H\,  thanks  to  a 

separate  processing  of  H\,  #2U#3>  and  Ha  in  this  case. 


HI  H2  H3  H4 


Figure  2.  Recognition  rate  of  deconditioning  (gray)  and 
plausibility  correction  method  (black) 


Figure  3  shows  the  same  results  when  compatibility 
relations  are  implemented.  In  accordance  with  the 
individual  confusion  matrix  of  each  source,  we  assume 
that  7/4  is  compatible  with  H\  and  H2  as  regards 
source  1,  and  that  H\  is  compatible  with  H2  and  #4  as 
regards  source  2.  The  only  difference  with  the  previous 


results  concerns  the  better  recognition  of  H 2,  thanks  to 
compatibility  relations  that  ovoid  any  confusion  between 
this  hypothesis  and  the  missing  hypotheses  of  each 
source. 


HI  H2  H3  H4 


Figure  3.  Recognition  rate  with  compatibility  relations 
(deconditioning  in  gray,  plausibility  correction  in  black) 


5  Integration  of  Contextual  Knowledge 

Contextual  information  can  obviously  be  integrated  in 
all  the  previous  processing  methods  thanks  to  the 
confidence  factors  qij  that  have  been  introduced  to  this 
end  in  the  models  elaborated  in  section  2.  As  pointed 
out  in  section  3.5,  robust  values  of  the  qij  can  be 
arbitrarily  determined  beforehand  for  different  possible 
situations  to  identify,  from  the  moment  sufficiently  clear 
relations  can  be  established  between  these  situations  and 
the  effect  of  the  confidence  factors.  When  complex 
problems  arise  from  that  point  of  view,  more  efficient 
learning  methods  have  to  be  implemented.  Specific 
neural  approaches  have  for  instance  been  developed  to 
this  end  [10]. 

Nevertheless,  as  the  qij  are  continuous  variables,  their 
most  efficient  use  consists  in  computing  them  on  line  on 
the  basis  of  contextual  parameters,  provided  that  the 
latter  are  observable.  Such  a  solution  is  presented  in 
section  5.1,  but  considering  the  operational  interest  of 
adaptative  processing  based  on  contextual  parameter 
measurements,  the  following  introduces  different 
solutions  to  this  approach  in  a  common  framework  [11, 
12]. 

So  we  assume  that  a  particular  context  z-{z\,...,zp }  is 
defined  on  a  P-dimensional  space  Z  by  P  contextual 
variables  zu  that  allow  to  evaluate  the  sensor  reliability. 

Moreover,  the  vector  zm={z\m,...,zpm)  represents  the 
context  measurements  that  are  available. 

The  problem  is  to  integrate  the  context  measurements 
zm  in  classification  processes  such  as  those  developed  in 
section  3,  in  order  to  improve  their  robustness  to  context 
variations.  Two  methods  can  be  established  to  achieve 
such  an  integration.  Both  of  them  implement  the  same 


5-13 


combination  rule  between  hypothesis  assessments 
coming  from  sensor  measurements  on  the  one  hand,  and 
reliability  information  based  on  context  measurements 
on  the  other  hand,  but  at  different  levels.  This  rule  is 
called  the  CC  Rule  (Contextual  Combination  Rule).  Its 
presentation  needs  to  define  first : 

•  the  inclusive  validity  domain  Dy  as  the  fuzzy  subset 
of  contexts  (Z)vc2)  in  which  a  mass  function  mv(.)  is 
valid ; 

•  the  index  W  as  a  subset  of  the  set  V  of  all  the  indexes 
v  that  characterize  the  different  available  mass 
functions  mv(.) ; 

•  the  exclusive  validity  domain  djy  as  the  fuzzy  subset 
of  contexts  (djyCZ)  in  which  every  mass  function 
mv(.)  such  that  ve  W  is  valid,  but  no  other  one  (vg  W) 
is  valid : 

dw=  n  £>v*  n  -Dy  ,  \/WQV-{0}  (5.1) 

veW  vg  W 

d0=  n  -Dv  (5.2) 

ve  V 

In  its  general  formulation  the  CC  Rule  consists  in 
finding  a  global  mass  function  m(.)  on  the  frame  of 
discernment  considering  on  the  one 

hand  the  mass  functions  m  each  provided  on  E  by 
the  orthogonal  sum  of  the  mass  functions  mv(.)  such  that 
ve'W,  and  on  the  other  hand  a  mass  function  mc(.)  on 
Ec={dfV }.  The  latter  is  assumed  to  be  a  Bayesian  mass 
function  which  expresses  the  relative  confidence  we  have 
in  the  different  mass  functions  m]p(). 

First,  a  mass  function  m’(.)  on  Ecx£  is  obtained  from 
the  mass  function  mc(.)  relative  to  Ec  and  the  mass 
functions  relative  to  E,  which  has  to  be  such  that : 

•  the  coarsening  of  m’(.)  from  £cx£  to  Ec  leads  to  the 
mass  function  mc(.) ; 

•  the  conditioning  of  m'(.)  from  £cx£  to  dfyxE 
provides  the  mass  function  mjp(.). 

These  conditions  are  verified  if : 

m'(dypP  =  mc(dyf/)*mw(A)  ,  V/1C£,  \/WQ  V  (5.3) 

Then  the  final  mass  function  m(.)  on  E  is  obtained  by 
coarsening  the  mass  function  m’(.)  from  £cx£  to  E  : 

m(A)  =  mc(d0)*m0(A)+  L  mc(dfl/)*mw(A)  , 
WQV-{0 } 

VACE  (5.4) 


5.1  Local  Contextual  Combination  Method  (LCCM) 

In  this  case  a  mass  function  mcij(.)  is  associated  with 
each  elementary  mass  function  mij( .)  used  by  the  models 
provided  in  section  2  (in  which  now  qjj=  1),  in  order  to 
characterize  its  reliability.  mcjj(.)  is  established  in  the 
following  on  the  frame  of  discernment  Ecij={Dy-Dy}, 
where  Dy  is  the  inclusive  validity  domain  of  mp). 

Let  the  context  z^={z\,...,zp}  be  a  random  vector  of 
probability  density  p(z!zm)  where  zm={z\m,...,zpm)  is 
the  measurement  vector  associated  to  z.  Besides,  the 
validity  domain  Dy  is  defined  according  to  each 
contextual  parameter  zu,  in  the  framework  of  the  fuzzy  set 
theory,  by  an  elementary  membership  function  Pijui^u)- 
The  membership  function  pp)  that  characterizes  the 
validity  domain  Dy  according  to  the  context  z  is 
therefore  expressed : 

Pip)  =  min  {Pijdzu))  (5-5) 

u 

The  probability  P{Sj!Hi,zm)  that  the  sensor  Sj  is  reliable 
for  the  assessment  of  Hf  if  the  context  observation  is  zm 
can  then  be  obtain  thanks  to  the  definition  of  fuzzy  event 
probability  : 


P(Sj/Hi,zm )  =  j  pp)*p(z/zm)  d z  (5.6) 

Of  course,  when  the  contextual  variables  are  certain,  the 
probability  density  p(z/zm)  is  replaced  by  the  Dirac 
function  S(z-zm),  and  (5.6)  becomes : 

P(Sj/Hi,zm)  =  ppm)  (5.7) 

The  probability  (5.6)  can  in  any  case  be  finally 
formalized  as  a  Bayesian  mass  function  mcij(.)  such 
that : 

mcij(Dy)  =  P(SjlHi,zm)  (5.8) 

mcipDjj)  -  l-P(Sj/Hi,zm)  (5.9) 

mcij(Ecij)  =  0  (5.10) 


Two  mass  functions  (IFe  {1,2})  have  now  to  be 

introduced :  One  of  them  uses  the  measurements  as  if 
they  were  completely  reliable  (JV—  1),  while  the  other  is 
representative  of  the  total  ignorance  (IV=2).  These  mass 
functions  are  therefore  defined  by : 

m\  ij(.)  =  mp)  (5-11) 

m2ij{E)  =  1  (5-12) 

Applying  the  CC  Rule  to  the  particular  mass  functions 
such  that  mc(.)  is  mcp)  and  myy(.)  are  mwijQ 
(We  {1,2})  provides  a  modified  mass  function  m’jj(.) 
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that  replaces  the  mass  function  my(.)  in  the  models 
developed  in  section  2  : 

m  ’ij(Hi)  =  P(Sj/Hi,zm)*mij(Hj)  (5.13) 

m  'ij(rHi)  =  P(Sj/Hi,zm)*m  y(—H i)  (5.14) 

m  ’ij(E)  =  1  -P(Sj/ Hj,zm)+P(Sj/ *  m  ij(E)  (5.15) 

The  orthogonal  sum  of  all  the  modified  mass  functions 
m  according  to  i  and  j  provides  finally  the  global 
mass  function  m(.)  that  is  used  in  the  decision  rule,  as 
explained  in  section  3. 

In  fact  the  resulting  expression  of  m  ’y(.)  is  similar  to  the 
initial  expression  of  my(.)  in  which  the  confidence  factor 
qy  would  be  such  that : 

qij  ~  P(SjlHi,zm)  (5.16) 

So,  the  LCCM  is  strictly  the  method  developed  in 
section  3,  in  which  qij  is  simply  expressed  by  (5.16), 
(5.6),  and  (5.5). 

5.2  Global  Contextual  Processing  Method  (GCPM) 

A  unique  mass  function  mc(.)  is  now  considered  to 
characterize  the  relative  validity  of  all  the  mass  functions 
my{.)  and  all  their  possible  combinations.  If  IP  is  a 
subset  of  F={l,...,/}x{l, ...,</},  mc{.)  is  more  precisely 
defined  on  the  frame  of  discernment  Ec={dgzj,  provided 
that  dw  stands  for  the  fuzzy  subset  of  contexts  (cfypCZ) 
where  all  of  the  myQ  such  that  07)  £  W  are  valid,  but 
only  these  ones. 

The  membership  function  jUy(z)  that  characterizes  the 
validity  domain  Djj  of  my(.)  according  to  the  context  z 
remains  expressed  as  in  section  5.1  by  (5.5). 

The  probability  of  validity  relative  to  the  association  W 
of  mass  functions  my(.)  is  the  probability  of  the 
conjunction  between  their  respective  fuzzy  subsets  Djj, 
when  a  contextual  observation  zm  is  available  : 

P(  fl  Djj/zm)  =  \  [  min  {/Xy(z)}]*/?(z/zw)  dz  (5.17) 
07>  w  07)6  W 

The  exclusive  probability  of  validity  P(djy)  relative  to 
the  association  W  of  mass  functions  my(.)  can  therefore 
be  obtained  on  the  basis  of  (5.17),  thanks  to 
developments  similar  to  those  provided  in  [13].  This 
probability  is  directly  the  Bayesian  mass  function  mc(.) 
that  has  to  be  expressed  on  Ec  : 

mc{dW)=  I  {-\)W’-W\*  P{  ft  Dij/zm) ,  \/W*0 
W’QV  07)e  W'  (5.18) 

WQW’ 


mc(d0)  =  P(  n  -Dij/zm)  (5.19) 

Os/)  6  V 

\W’-W\  represents  the  cardinal  of  the  subset  W’-W. 

Correlatively,  each  mass  function  mffi.)  is  the 
orthogonal  sum  of  the  mass  functions  my (.)  such  that 
07) e  W,  if  they  exist : 

mw(.)=  0  my(.)  ,  VW*0  (5.20) 

07)6  W 

and  m0(.)  corresponds  to  the  absence  of  information  : 

m0{E)=  1  (5.21) 

Thus,  the  CC  Rule  can  be  applied  in  its  general  form  on 
(5.18),  (5.19),  (5.20),  and  (5.21),  to  provide  the  global 
mass  function  m(.)  that  is  used  in  the  decision  rule,  as 
explained  in  section  3  : 

m(.)=:mc(d0)*m0(.)+  1  mc(dfy)*mp[{.)  (5.22) 
WQV-{0} 

5.3  Unified  Formalism 

A  practical  case  of  interest  consists  in  adding  a  further 
hypothesis  to  E,  corresponding  to  an  unexpected  objet 
for  which  no  previous  learning  is  available.  This 
problem  can  be  managed  thanks  to  the  models  developed 
in  section  2  that  are  used  here,  as  explained  in  section  4. 
The  modifications  it  involves  in  both  methods,  LCCM 
and  GCPM,  allow  to  express  them  in  a  same  formalism. 

Such  a  common  formalism  consists  in  implementing 
(5.20),  (5.21),  and  (5.22)  in  the  new  extended  frame  of 
discernment.  The  difference  between  both  methods 
appears  in  the  expression  of  mc(.)  which  remains  given 
by  (5.5),  (5.17),  (5.18),  and  (5.19)  as  regards  GCPM, 
while  LCCM  has  to  use  : 

mc(dfV)=  II  qij  *  II  (l-<7y)  (5-23) 

07)6  W  07)6  W 

mc(d0)  =  II  (1  -qij)  (5-24) 

07)6  V 

with  qy  defined  by  (5.16). 

Furthermore,  a  numerical  analysis  shows  that  both 
approaches  have  complementary  capacities  when  facing 
adverse  situations,  as  regards  the  relative  intrinsic 
efficiency  and  reliability  of  sensors,  as  well  as  the 
accuracy  of  the  contextual  observations. 


5.4  Illustration 

These  methods  have  been  implemented  to  deal  with  a 
problem  of  pixel  fusion  in  multispectral  image 
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processing,  using  the  water  vapor  transmittance  as 
contextual  variable  (see  for  instance  [11]).  This 
application  involves  two  hypotheses  (//i=Asphalt, 
//2=Vegetation),  and  two  sensors  (5i=2-2.3|J.m,  52=0.4- 
0.6|a.m).  Previous  learning  of  radiances,  as  regards  a 
particular  value  of  the  water  vapor  transmittance, 
provides  normal  distributions  of  radiances  specific  to 
each  sensor  and  each  hypothesis  as  prior  knowledge,  so 
that  the  model  developed  in  section  2.2.1  is  used. 
Besides,  every  membership  functions  is 

determined  beforehand  thanks  to  the  prior  estimation  of  a 
physical  relation  between  the  water  vapor  transmittance 
and  the  mean  radiance. 

Figure  4  shows  an  example  of  mean  recognition  rates 
obtained  by  LCCM,  probabilistic  solution,  and  both 
sensors  implemented  alone,  when  the  evolution  of  the 
water  vapor  transmittance  is  perfectly  known.  Results  are 
presented  as  functions  of  the  most  sensitive  mean 
radiance  difference  when  the  water  vapor  transmittance  is 
decreasing  from  the  learning  condition  to  0. 


Ma2i-Mr21 

Figure  4.  Recognition  rate  : 

LCCM  ( _ ),  probabilities  ( - ), 

sensor  1  ( _ ),  sensor  2  ( . ) 

These  results  emphasize  the  robustness  brought  by  the 
adaptativity  of  LCCM. 

6  Matching  of  Ambiguous  Observations 

Implementing  sensors  with  orthogonal  spatial 
resolutions,  may  improve  the  precision  of  target 
localization,  thanks  to  the  conjunction  of  local 
observations.  It  also  provides  a  better  separation  between 
multiple  targets  in  order  to  count  them  or  to  implement 
further  analysis  such  as  classification  processes.  For 
instance,  such  a  situation  occurs  when  we  associate 
either  delocalized  passive  sensors  that  provide  local 
angle  measurements,  or  an  active  sensor  that  provides 
distance  and  Doppler  with  a  passive  colocalized  sensor 
that  provides  precise  angular  measurements. 


Nevertheless,  dealing  with  multiple  targets  in  an  area  of 
interest  leads  to  ambiguities  in  data  association,  due  to 
ghosts  (erroneous  matching),  hidden  targets,  non¬ 
detections,  and  false  alarms,  as  illustrated  in  figure  5  in 
the  case  of  two  delocalized  passive  sensors. 


Sensor  1  Sensor  2 


A  Detection  of  actual  targets 
Q  Ghosts 

m  Hidden  target  for  sensor  1 
^  False  alarm  on  sensor  1 
^  Non-detection  on  sensor  2 

Figure  5.  Ambiguous  data  association 

To  face  this  problem,  a  global  approach  of  the  detection, 
counting,  localization,  and  classification  functions  is 
proposed  in  the  following,  that  is  a  generalization  of  the 
developments  provided  in  [14]. 

6.1  Formulation  of  the  Problem 

Developments  are  presented  in  the  case  of  two  sensors, 
in  order  to  simplify  expressions,  but  their  generalization 
to  any  number  of  sensors  is  obvious,  and  the  illustration 
in  section  6.3  shows  the  implementation  of  4  sensors. 

Sensor  1  has  N  resolution  cells  x\n  that  are  orthogonal 
to  the  M  resolution  cells  X2m  of  sensor  2,  as  shown  in 
figure  6.  Furthermore  xnm  stands  for  the  intersection  of 
cells  x\n  and  X2m- 


5-16 


Each  sensor  provides  a  signal  in  each  of  its  resolution 
cells  :  s\n  is  the  signal  provided  by  sensor  1  in  cell  x\n, 
and  s2m  is  the  signal  provided  by  sensor  2  in  cell  x2m. 


Sensor  2 


Figure  6.  Definition  of  resolution  cells 


In  each  resolution  cell  x\n  or  x2m,  there  may  be  0,  1,  or 
several  targets.  According  to  the  application  encountered, 
a  specific  logic  of  target  observability  has  to  be 
considered.  The  following  developments  are  valid  for 
any  of  these  logics,  but  for  simpler  explanation  a 
particular  one  will  be  assumed.  It  is  such  that  when  there 
are  several  targets  in  a  cell  x\n  (respectively  X2m),  the 
signal  provided  in  this  cell  comes  from  the  target  that  is 
the  nearest  from  sensor  1  (respectively  sensor  2).  All  the 
others  are  hidden  by  this  first  target,  and  consequently 
no  information  is  available  from  sensor  1  (respectively  2) 
about  their  presence  or  identity.  Nevertheless,  it  is 
assumed  that  there  is  at  the  most  one  target  in  an 
intersection  xnm. 


Furthermore,  a  confidence  factor  qi\ n  (respectively  qj2m) 
with  values  in  [0,1]  may  be  associated  to  each  likelihood 
Cj \n  (respectively  C/2,M)  in  order  to  characterize  its 
representativity  in  the  context  actually  met.  Moreover, 
the  general  assumptions  made  in  section  2.1  about  Cjj 

and  q\j  are  suitable  as  regards  [Ci\n,qi\n]  and 

[C{2m,qaml 

-  The  similarity  Rnm  of  signals  s\n  and  S2m  is  assessed 
on  [0,1],  thanks  to  knowledge  about  physics  and 
operational  context  that  allows  to  characterize  a 
similarity  relation  between  signals  coming  from  a  same 
object  (joint  probability  distribution,  fuzzy  relation,...). 
Once  again,  the  general  assumptions  made  in  section  2. 1 
about  Cij  are  suitable  for  Rnm,  as  the  latter  is  a  trivial 
case  of  the  generic  problem  :  1  source  (S^comparator), 
2  hypotheses  (//^similarity,  /^-’similarity),  all 
qij=  1,  and  incomplete  information  (no  C21  available). 

-  A  logic  of  matching  is  defined  to  integrate  the  logic  cf 
target  observability  introduced  above.  Such  a  logic  leads 
simply  to  a  suitable  definition  of  the  frames  of 
discernment  and  focal  elements  that  are  used  to  model 
any  available  information.  Considering  the  specific  logic 
of  observability  introduced  above  as  example,  the 
likelihood  C/ \n  has  to  express  that  s\n  allows 
discrimination  on  : 


E\n  =  {Ho\n,H\\n,...,Hi\n,...,Hi\n}  (6.2) 

where  H§\n  means  «  no  target  in  x\n  »,  and  Hj\n 
means  «  at  least  one  target  in  x\n,  and  the  identity  cf 
the  nearest  one  is  i »,  for  every  i  from  1  to  I.  Similarly, 
the  likelihood  Cj2m  has  to  express  that  S2m  allows 
discrimination  on  : 


On  that  basis,  the  process  has  to  find  the  most  likely 
distribution  of  targets,  with  their  identity,  over  all  the 
cells  xnm,  as  such  a  conclusion  sums  up  the  results  of 
detection,  counting,  localization,  and  identification 
functions  together.  In  other  words,  this  consists  in 
looking  for  the  most  likely  singleton  of : 


E2m  =  m2m,H\2m,--,Ham,-,Hl2m}  (6-3) 

where  HQ2m  means  «  no  target  in  x2m  »,  and  Hf2m 
means  «  at  least  one  target  in  x2m,  and  the  identity  cf 
the  nearest  one  is  i »,  for  every  /  from  1  to  I. 


E=  {(//l  1  ,ffi  1 ,. . .  ,//Wl  ,//12, . . .  ,Hnm, . . .  ,//NA/)}  (6.1) 

where  Hnm  =  0  if  there  is  no  target  in  xnm,  and  Hnm  =  i 
if  there  is  a  target  of  identity  i  in  xnm. 

To  achieve  such  a  purpose,  three  different  types  cf 
information  are  considered : 

-  The  likelihoods  Ci\n  and  Cq m  of  any  hypothesis  cf 
identity  i  are  respectively  assessed  in  [0,1]  by  sensors  5] 
and  S2  on  the  basis  of  every  signal  s\n  or  S2m  that  each 
of  them  observes,  and  thanks  to  their  suitable  prior 
learning  (probability  distributions,  fuzzy  knowledge,...). 


Furthermore,  the  similarity  of  two  signals  s\n  and  s2m 

means  in  this  specific  case  that  there  is  a  target  in  xnm, 
and  no  other  target  in  front  of  it,  either  for  sensor  1  or  for 
sensor  2  (see  assumption  about  hidden  targets).  So,  a 
similarity  analysis  between  s\n  and  s2m  allows 
discrimination  on  {Anm,Pnm}  which  is  a  partition  of  E 
defined  by  : 
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pnm  =  with  : 

HU  *  0  if  i=n  and  j=m 

HU  =  0  if  i=n  and  j<m 

HU  =  0  if  i<n  and  j=m 

Either  HU  =  0  or  HU  *  0  if  ten  and  j#m 

Either  HU  =  0  or  HU  *  0  if  i=n  and  j>m 

Either  HU  =  0  or  HU  *  0  if  On  and  j=m  (6.4) 

A”m  =  E-P’"”  (6-5) 

Figure  7  shows  a  representation  of  Pnm. 


Sp.nsnr  2 


m  ^  0  HU  =  0  Hj-ij  Either  =  0  or  ^  0 


Figure  7.  Description  of  Pnm 
6.2  General  solution 

A  first  mass  function  tn°(.)  has  to  be  elaborated  on  the 
basis  of  the  «  classification  information  »,  i.e.  all  the 
likelihoods  C/i«  and  Cam  and  their  confidence  factor. 
The  frame  of  discernment  of  m°(.)  must  be  E,  as  regards 
the  purpose  of  the  process. 

Nevertheless,  according  to  assumptions  of  the  problem, 
the  7+1  likelihoods  C/iw  attached  respectively  to  each  of 
the  7+1  hypotheses  of  identity  77/i«  that  correspond  to  a 
same  resolution  ce\\x\n  of  sensor  1,  and  their  associated 
confidence  factor,  have  to  generate  a  distinct  mass 
function  m\% )  on  their  specific  frame  of  discernment 
E\n.  Then,  the  mass  functions  m ]”(.)  and  their 
counterparts  m2m(  )  for  sensor  2  must  be  refined  from 
their  own  frame  of  discernment  E  l”  or  E2m  to  the 
common  frame  E  in  order  to  allow  their  orthogonal  sum 
and  provide  the  expected  mass  function  m°Q. 

The  formulation  of  each  rn\n(.)  or  m2m(.)  as  a  function  of 
the  corresponding  Cj \n  or  C/2m  and  the  associated 
confidence  factors  has  to  fit  in  with  the  developments 
provided  in  section  2.1,  so  that  it  is  the  orthogonal  sum 


of  mass  functions  mnn(.)  (respectively  m(2mQ), 
according  to  /,  which  are  expressed  by  the  less  specific 
model  (2.3),  (2.4),  and  (2.5),  implemented  on  Cj\n  and 
qi\n  (respectively  CqP1  and  qam)  for  7=1.  Furthermore, 
all  expressions  of  Cy  provided  in  section  2  according  to 
the  nature  of  learning  and  observation  (precise  value, 
probability  distribution,  or  fuzzy  membership  function) 
are  suitable  for  Cj\n  and  Cf2m- 

A  second  mass  function  m°°(.)  has  also  to  be  elaborated 
on  the  basis  of  the  «  similarity  information  »,  i.e.  the 
similarity  Rnm  considered  for  all  the  hypotheses  cf 
signal  association  (s\n,s2m),  having  regard  to  matching 
logic.  The  frame  of  discernment  of  must  be  E,  as 
regards  the  purpose  of  the  process. 

According  to  the  definition  of  the  similarity  relations, 
each  hypothesis  of  signal  association  ( s\n,s2m )  has  to  be 
considered  separately,  so  that  a  mass  function  mnm(.)  is 
first  established  on  {A™,Pnm}  thanks  to  the  less 
specific  generic  model  expressed  by  (2.3),  (2.4),  and 
(2.5)  in  section  2.1,  as  regards  the  equivalence  between 
Rnm  and  the  generic  problem  that  has  been  emphasized 
in  section  6.1.  m°°(.)  is  then  simply  the  orthogonal  sum 
of  the  mass  functions  mnm{),  according  to  n  and  m. 

The  orthogonal  sum  of  m%)  and  leads  finally  to  a 
global  mass  function  on  E  that  allows  to  select  the 
expected  most  likely  singleton  of  E,  thanks  to  decision 
making  rules  presented  in  section  3. 

More  precisely,  the  process  consists  in  the  seven 
following  steps  : 

-  Step  1  :  for  the  N*M  resolution  cells  of  both  sensors, 
elaboration  of  the  (7+1)  mass  functions  mn%)  or 
mf2m(.)  that  model  the  classification  information  as 
introduced  above.  The  corresponding  plausibility 
functions,  that  are  necessary  in  the  following,  are 
respectively  : 


Pliln(Hi\n)  =  \-qi\n+qi\n*Ci\n  (6-6) 

Pli\n{-Hi\n)  =  \  (6-7) 

and  : 

piam(Hr2m) =  Hn.m+<rnm*  Cnm  (6-8) 

piam(-Ham)  =  i  -  (6-9) 


reminding  that  all  expressions  of  Cy  provided  in 
section  2  according  to  the.  nature  of  learning  and 
observation  (precise  value,  probability  distribution,  or 
fuzzy  membership  function)  are  suitable  for  the 

elaboration  of  C/i”  and  C/2m- 

-  Step  2  :  Refining  each  of  these  (1+\)*(N+M)  mass 
functions  from  E\n  (respectively  E2m)  in  £• 
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-  Step  3  :  Orthogonal  sum  of  the  (7+1)*(jV+A/)  mass 
functions  in  E.  As  a  result  of  steps  2  and  3,  the 
plausibility  function  corresponding  to  the  resulting  mass 
function  m°(.)  can  be  expressed  simply  for  the  singletons 
of  E  (only  these  terms  are  necessary  in  the  following) : 

1 , . . .  ,Hnm,  ■  ■  ■  ,HNM)  =  (1-Ar°)‘l  * 

n  n  pinn(xn”)  *  n  n  piam(xam)  (6.io> 

n  i  mi 

with  : 


XQ\”  =  Ho\n  if  V/w,  Wm= 0 

X$\n  =  i n  if  3m,  Hnm^Q 

Xi\n  =  Hi\n ,  for  l<z<7,  if  3w  :  W™=i 

m'm= o 
\/m'<m,Hnm'=  0 

Xjin  =  -lHnn ,  for  l</<7,  otherwise 
Xo2m  =  HQ2m  if  V«,  Hnm= 0 
X02m  =  ~'Ho2m  if  3n,  0 

Xam  =  Ham  ,  for  1  <i<I,  if  3 n  :  W™=i 

Vn’<n, 

'im,<m,mm’=0 

Xf2m  =  ~'Hi2m  ,  for  1  <i<I,  otherwise 

and  in  which  K°  is  the  combination  inconsistency,  the 
expression  of  which  is  not  useful  for  the  following. 

-  Step  4  :  Elaboration  of  the  N*M  mass  functions  mnmQ 
that  model  the  similarity  information  as  introduced 
above.  The  corresponding  plausibility  functions,  that  are 
necessary  in  the  following,  are  respectively  : 

Pinm(pnm')  =  pnm  (6.11) 

Plnm(Anm)  =  1  (6.12) 

-  Step  5  :  Orthogonal  sum  of  these  N*M  mass  functions 
in  E.  The  plausibility  function  corresponding  to  the 
resulting  mass  function  m°°(.)  can  be  expressed  simply 
for  the  singletons  of  E  (only  these  terms  are  necessary  in 
the  following) : 

/7°°(//ll,...  ,Hnm, . . .  ,HNM)  = 

(l_Xoo)-l  *  j-j  n  pinm(xnm )  (6.13) 

n  m 


with  : 


-  Step  6  :  Orthogonal  sum  of  m°{.)  and  m°°{.)  in  E  to 
provide  the  resulting  mass  function  m(.).  The 
corresponding  plausibility  function  can  be  expressed  for 
the  singletons  of  E  : 

Pl(HU,...,HNM)  =  (l-K)-l  * 

/>/°(//l  1 , . . . ,Hnm)  *  Pl°°(H\  1 , . . .  ,HNM)  (6. 14) 

in  which  K  is  the  combination  inconsistency,  the 
expression  of  which  is  not  useful  for  the  following. 

-  Step  7  :  Selection  of  the  singleton(s)  that  provide(s)  a 
maximum  of  the  plausibility  function  /’/(.),  according  to 
the  decision  principles  discussed  in  section  3. 


6.3  Illustration 

The  implementation  of  a  very  simple  example  is 
described,  in  order  to  discuss  how  this  method  works. 
Four  sensors  are  observing  a  same  area,  according  to  the 
situation  illustrated  in  figure  8.  Each  sensor  has  only 
two  resolution  cells,  and  these  cells  are  the  same  ones, 
on  the  one  hand  between  sensors  1  and  3  : 

xi  1  =*3!  =xl  lUx!2 

jq2  =  x3^  =x21u*22 


and  on  the  other  hand  between  sensors  2  and  4  : 
x2^  -x^  =  jcI  1  Ujc2  1 

*2^  =  X42  =  xl2i_Jx22 


Sensor  2 


Xnm  =  pnm  if  ppxmjS) 

hM= 0  for  k=n  and  l<m 
fi[M= 0  for  k<n  and  l=m 
Xnm  =  ^nm  otherwise 

and  in  which  K°°  is  the  combination  inconsistency,  the 
expression  of  which  is  not  useful  for  the  following. 


Figure  8.  Implementation  of  sensors 


The  use  of  four  sensors  ensures  that  a  target  will  always 
be  seen  by  at  least  two  sensors  with  orthogonal 
resolution  cells  in  any  circumstances,  in  spite  of  the 
phenomenon  of  hidden  target.  So,  in  principle,  the 
localization  of  all  the  targets  is  possible.  The  only 
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problems  that  may  remain  concern  the  classification,  and 
mainly  the  ambiguities  in  matching,  on  which  we  want 
to  focus.  Furthermore,  it  is  assumed  that  there  is  only 
one  possible  identity  of  target,  so  that  the  problem  cf 
classification  is  reduced  to  a  problem  of  detection 
(absence  or  presence  of  a  target). 

The  signal  in  each  cell  of  each  sensor  is  an  energy 
measurement  generated  thanks  to  random  variables 
which  have  a  normal  distribution  N(0,\)  if  there  is  no 
target  in  the  cell,  and  N(3,l)  if  there  is  at  least  one  target 
in  it.  Furthermore,  the  variables  that  simulate  two 
different  signals  are  independent  if  the  signals  are  not 
coming  from  a  same  target,  and  the  standard  deviation  of 
their  difference  is  constrained  to  0.1  if  the  signals  are 
coming  from  a  same  target. 

In  this  numerical  application  the  probability 
distributions  of  signals  are  learned  without  any  problem 
of  representativity,  so  that,  for  the  two  cells  h  of  any  cf 
the  four  sensors  k,  and  the  likelihoods  CnJ7  are 

given  by  the  model  (2.20)  on  the  basis  of : 

p(skh/HQkh)  =  N(0,l) 
p(.skh/Hikh)  =  N(3,l) 

According  to  their  definition  (2.19),  the  corresponding 
normalization  factors  are : 

/?*  =  ( 2*71)1/2 

To  complete  prior  knowledge,  the  similarity  relation 
between  signals  coming  from  a  same  target  expresses 
that  these  signals  are  «  almost  equal »,  in  accordance 
with  the  correlation  that  is  simulated  between  actual 
signals  (see  above).  Therefore,  the  set  of  associations 
of  signals  coming  from  a  same  object  is 
assumed  to  be  a  fuzzy  subset  of  the  set  of  all  the  possible 
associations,  that  is  characterized  by  the  membership 
function  pis/fi^k’*1’)  dedicated  to  cell  h  of  sensor  k  and 
cell  h’  of  sensor  k’.  p(skh,sk ,h ') is  drawn  in  figure  9  as  a 
function  of  I  sft-sk’h  ’  I . 


Figure  9.  Similarity  relation  between  sf  and  sk,h 


Simulations  have  been  run  for  the  16  possible  actual 
distributions  of  targets  over  the  4  cells  xnm,  including 
all  numbers  and  positions  of  targets.  A  statistical 
analysis  of  results  is  presented  in  figure  10.  It  provides 
the  mean  recognition  rate  of  the  right  target  distribution 
(singleton  of  £),  for  all  the  possible  actual  distributions, 
as  a  function  of  a  maximum  number  of  ambiguities,  i.e. 
the  rate  of  the  right  distribution  recognition  when  we 
allow  to  declare  at  the  most  the  number  of  ambiguities 
given  by  abscissa.  Ambiguities  are  multiple  declarations 
of  target  distributions,  including  the  right  one,  when 
these  distributions  lead  to  the  same  maximum  value  cf 
criterion.  Note  that  it  is  different  from  the  confusion 
notion  that  characterizes  wrong  declarations. 

Results  are  shown  for  three  different  criteria : 

-  Classification  criterion :  it  consists  in  maximizing 
Pl°{),  which  uses  only  classification  information. 

-  Similarity  criterion  :  it  consists  in  maximizing  Pl°°Q, 
which  uses  only  similarity  information. 

-  Global  criterion  :  it  consists  in  maximizing  /’/(.), 
which  uses  all  the  available  information  (classification  + 
similarity).  It  corresponds  to  the  method  that  has  been 
elaborated  in  previous  sections. 


Mean  Recognition  Rate 


Figure  10.  Recognition  rate  of  target  distributions 
a  -  Classification  criterion 
b  -  Similarity  criterion 
c  -  Global  criterion 


These  results  emphasize  several  behaviors  : 

-  as  expected,  the  method  achieves  a  good  recognition  of 
target  distributions  {i.e.  at  the  same  time  a  good 
detection,  counting  and  localization),  without  any 
ambiguity. 

-  the  two  types  of  information  processed  (classification 
and  similarity)  show  a  very  good  complementarity,  as 
the  result  of  their  association  provides  a  much  better 
performance  than  any  of  them  alone. 
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-  classification  information  has  a  quiet  good  recognition 
capacity,  but  it  suffers  from  a  number  of  ambiguities  that 
reduces  drastically  its  performance. 

-  similarity  information  has  a  very  poor  recognition 
capacity,  and  furthermore  it  suffers  from  ambiguities. 
Nevertheless  it  allows  to  reject  all  the  ambiguities  of 
classification  information,  and  at  the  same  time  it 
improves  the  recognition  rate  of  the  latter. 


7  Target  Tracking 

The  problem  dealt  with  here  is  that  of  tracking  a  moving 
target  of  any  possible  nature,  in  a  dense  environment, 
using  observations  delivered  by  a  set  of  disparate  and 
possibly  delocalized  sensors.  One  of  the  main  purposes 
is  to  overcome  the  problem  of  spurious  sources  present 
in  the  vicinity  of  the  target.  These  sources  may  be  due  to 
intelligent  countermeasures,  artifacts,  or  vehicles  that  are 
untracked,  for  operational  or  technical  reasons.  A 
situation  of  major  practical  interest  appears  when  the 
tracking  is  initialized  on  objects  that  are  very  close 
together  or  even  at  the  same  point,  such  as  when  a 
fighter  plane  enters  an  airspace  hidden  behind  or  close  to 
an  airliner.  Simultaneous  tracking  of  multiple  targets 
may  also  be  suitably  handled  with  the  proposed 
approach,  as  is  shown  by  the  extension  proposed  in 
section  7.2. 

Unlike  classical  methods,  the  concept  proposed  performs 
a  filtering  directly  on  the  discriminatory  features 
available  in  the  different  resolution  cells  of  each  sensor, 
rather  than  on  plots  provided  by  a  detection  procedure 
[15].  So,  it  is  elaborated  according  to  a  global  approach 
that  integrates  in  a  same  processing  both  tracking  and 
classification  functions. 

Although  it  constitutes  no  particular  limitation  on  the 
concept  emphasized,  the  discussion  here  presumes  that 
the  target  tracked  is  the  only  one  of  its  particular  identity 
in  the  space  being  processed,  and  that  a  given  resolution 
cell  contains  at  most  one  target  of  any  given  identity. 

The  technique  used  for  the  filtering  aspects  is  inspired 
directly  from  the  Probabilistic  Data  Association  Filter 
(PDAF)  family  of  methods  developed  by 
Y.  BAR  SHALOM  from  the  ordinary  KALMAN  filter, 
to  handle  multiple  detections  [16].  These  methods  differ 
essentially  from  the  KALMAN  filter  by  the  estimate 
updating  phase,  in  which  they  proceed  in  two  steps  : 

-  First  the  statistical  gating  selects  the  detected  plots 
located  in  a  given  vicinity  of  the  predicted  position.  The 
vicinity  is  determined  so  as  to  contain  the  target  with  an 
a  priori  probability  greater  than  a  given  threshold. 

-  Then  the  estimate  and  its  covariance  are  updated  on 
the  basis  of  an  innovation  determined  by  linear 
combination  of  the  innovations  individually  due  to  each 
plot  retained  as  potential  successor  of  the  processed 
track.  The  weighting  coefficients  are  the  a  priori 
probabilities  for  each  of  these  plots  to  actually  be  due  to 


the  target,  considering  the  detection  and  false  alarm 
probabilities  of  the  detector  used,  the  predicted  position 
and  its  covariance,  and  the  statistical  gating  threshold. 

In  a  first  approximation,  the  method  proposed  here  can 
be  interpreted  as  a  PDAF  whose  detection  would  operate 
at  minimum  threshold,  with  Detection  Probability  = 
False  Alarm  Probability  =  1.  At  the  level  of  the 
statistical  gating,  then,  this  is  equivalent  to  retaining 
and  processing  one  plot  per  resolution  cell  located 
within  the  vicinity  defined  around  the  predicted  position. 

The  «  a  priori »  probability  that  weights  the  innovation 
due  to  each  of  these  plots  in  updating  the  estimate  is,  on 
the  other  hand,  modified  to  reflect  the  likelihood  of  the 
identity  present  in  the  corresponding  cell,  information 
generated  thanks  to  the  recognition  of  identity  features 
extracted  from  the  signal  isolated  by  the  spatial 
resolution  of  the  sensors. 

The  expression  for  the  filtering,  prediction,  and 
statistical  gating  modules  specifically  for  this  modified 
version  of  the  PDAF  is  given  in  the  appendix  C.  The 
following  discussion  concerns  the  special  development  of 
the  cell  weightings  then  necessary  for  the  innovation, 
that  requires  two  indispensable  ideas  to  be  defined  : 

-  The  sensors  are  said  to  be  «  aligned  »  if  they  break 
the  validation  gate  down  into  the  same  resolution  cells. 
For  convenience  here,  the  sensors  are  assumed  to  be 
classed  in  groups  of  sensors  that  are  aligned  among 
themselves,  while  two  sensors  of  two  different  groups  are 
necessarily  unaligned.  Each  sensor  will  thus  be  denoted 
Sjl,  where  /  designates  to  which  of  the  L  groups  of 
aligned  sensors  the  sensor  in  question  belongs,  and  j  is 
its  sequence  number  within  the  group  of  J  sensors. 

-  If,  for  a  group  /  of  aligned  sensors,  x^n  designates  the 
n  th  of  N  resolution  cells  of  non-empty  intersection  with 
the  validation  gate,  then  the  sensors  in  question 
«  resolve  » the  gate  if  the  gate  entirely  includes  each  x^n. 


7.1  Procedure  Description 

The  extraction  of  features  in  each  resolution  cell  x^n  by 
each  sensor  Sjl  is  assumed  to  provide  information  of  the 
type  considered  by  the  generic  model  developed  in 
section  2,  and  therefore  by  any  of  the  models  proposed  in 
this  section.  So  we  have  I*J  mass  functions  mjj(.)  per 

resolution  cell  xln,  with  each  of  them  being  defined 
either  by  (2.3),  (2.4),  and  (2.5),  or  by  (2.6),  (2.7),  and 
(2.8).  So  they  are,  from  now  on,  denoted  by 

reference  to  the  resolution  cell  x^n  to  which  each  of  them 
relates,  and  their  respective  frames  of  discernment  are  the 
corresponding  partitions  of  E  denoted 
E/n={Hjln,^H/n},  where  the  hypothesis  H}n  means 
that  identity  Hj  is  present  in  cell  x^n.  It  will  be  noted 
that  the  use  of  the  models  established  above  is 
advantageous  in  light  of  their  suitability  to  the  problems 
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generally  encountered,  but  that  this  is  not 
indispensable  :  the  discussion  here  starts  with  any  given 
mass  functions  m\^n (.),  which  can  be  obtained  by  any 
other  means. 

The  procedure  therefore  consists  in  combining  the 
various  sources  each  being  specific  to  a  sensor 

Sjl,  a  resolution  cell  xln,  and  a  particular  evaluated 
identity  ///.  The  combination  is  performed  in  such  a  way 
as  to  provide  the  likelihood  of  each  possible  distribution 
of  identity  hypotheses  (including  target  absence)  on  the 
M  resolution  cells  xm  of  the  validation  gate.  The  xm 
cells  are  the  intersections  of  the  x^n  cells  of  the  various 
groups  /  of  sensors,  so  that  the  combination  processes 
applied  offer  the  best  spatial  resolution  at  the  end  of  the 
process. 

Thanks  to  a  special  property  of  evidence  theory,  the 
combination  of  the  resulting  likelihoods  with  the  a 
priori  localization  probabilities  (cfi,am)  of  the  tracked 
target  delivered  by  the  filter  prediction,  directly  generates 
the  a  posteriori  localization  probabilities  03°,^)  of  the 


target  in  question.  The  probabilities  am  and  pm  are 
relative  to  the  presence  of  the  target  in  the  cell  xm 
(hypothesis  Hm ),  while  the  probabilities  a0  and  /3° 
concern  its  absence  in  the  gate  (hypothesis  HP).  The 
probabilities  pm  and  /3°  are  used  to  weight  the 
innovation  due  to  each  of  the  cells  X™  in  the  estimate 
update,  as  was  introduced  above. 

Considering  the  nature  of  the  problem,  the  required 
combinations  must  be  performed  by  orthogonal  sum  cf 
all  of  the  sources,  to  obtain  their  conjunction.  This  must 
be  done  in  the  finest  common  frame  of  discernment, 
which  is  the  set  EF  of  the  possible  identity  distributions 
on  the  various  cells  xm  of  the  validation  gate.  As  the 
orthogonal  sum  is  commutative,  the  association  order  cf 
the  various  sources  is  theoretically  arbitrary.  To  simplify 
the  calculations,  however,  the  approach  chosen  consists 
in  associating  the  sources  by  order  of  decreasing 
similarity  of  their  frame  of  discernment,  whereas 
applying  appropriate  refinements  at  each  step.  Figure  1 1 
shows  the  resulting  logic  of  operations. 


Figure  11.  Combination  processing 


M 

Developments  following  this  scheme  are  provided  in  /JO  =  oP/{cfi+  X  am*Qm)  (7-1) 

appendix  D.  They  lead  to  the  expression  of  the  m=\ 

probabilities  pm  and  /3°  that  the  filter  detailed  in 
appendix  C  requires  : 
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M 

P m  =  am*Qml{cfi>+  I  a®  ’ *Q ® '}  (7.2) 

m  ’=  1 

L 

in  which  :  Qm  =  FI  Q}n  (7.3) 

/ =  1 

xmc.xln 

with,  for  the  sensor  groups  /  that  resolve  the  gate  : 

J  7-1  J  7-1  J  J 

Qln  =  UAj/niil-W-nBijty+l  (YlAi/n-YlBijln)} 

7=1  7=1  7=1  7=1 7=1  7=1  (7.4) 

and,  for  the  sensor  groups  /  that  do  not  resolve  the  gate  : 

J  I  J  I  J  J 

Qln = n-4//«/{i-n(i-ny«)+x  ai^n*//")} 

7=1  7=1  7=  1  7=17=1  7=1  (7.5) 

In  both  cases,  77/  designates  the  identity  of  the  tracked 
target,  and  the  coefficients  A^n  and  B^n  represent, 
respectively,  the  expressions : 

Ai/n  =  {mjjln{Hiln)+mijln(Eiln)}l 

{mjjl”(^Hiln)+mi/n(Eiln)}  (7.6) 

Bijin  =  myln(Eilny{myln^Hiln)+mijln(Eiln)}  (7.7) 

The  resulting  filter  will  hereafter  be  designated  the 
Multiple  Signal  Filter  (MSF). 

Let  us  note  that,  if  we  consider  only  two  identity 
hypotheses  (i.e.  absence  or  presence  of  a  target  in  each 
cell  in  question),  if  the  available  information  is  of  the 
probabilistic  type  used  by  the  models  developed  in 
section  2.2.1,  and  if  the  distributions  are  perfectly 
representative  of  reality  (all  qijin=  1),  then  there  exists  a 
Bayesian  solution  to  the  problem.  It  is  easy  to  verily 
that  this  solution  actually  is  the  special  case  of  the  filter 
proposed  for  the  discussed  conditions.  Such  a  solution 
was,  for  example,  used  in  the  PDAFAI  to  include  the 
amplitude  of  the  observed  signal  [17]. 

On  the  other  hand,  as  soon  as  the  number  of  identity 
hypotheses  exceeds  2  (absence/presence  of  the  tracked 
target,  in  the  cell  considered),  no  formal  probabilistic 
approach  is  possible  any  more,  since  the  prediction  can 
no  longer  provide  the  a  priori  probabilities  of  the 
different  identities  needed  for  the  Bayesian  inference  of 
the  update.  One  of  the  advantages  of  the  approach 
proposed  is  therefore  to  obtain  an  exact  solution  for  these 
situations,  which  are  especially  of  concern  here  (see 
introduction  to  section  7).  The  method  described  also 
makes  it  possible  to  manage  the  uncertainty  on  the 
models  and  to  include  data  that  is  not  necessarily 
probabilistic. 


This  concept  also  naturally  provides  track  validation 
criteria,  typically  based  on  the  likelihood  of  the  actual 
presence  of  a  target  of  the  desired  identity  in  the 
validation  gate,  in  consideration  of  the  various  features 
observed. 

It  should  also  be  noted  that  such  a  filter  is  by  nature 
suited  to  a  given  identity,  the  purpose  of  the  proposed 
concept  being  to  reject  as  effectively  as  possible  those 
signals  that  might  be  due  to  neighboring  targets  of 
different  identity.  In  track  initialization  phase,  a  battery 
of  different  filters  suitable  for  different  identities  should 
therefore  be  used.  The  filter  whose  identity  is  most 
likely  can  be  chosen  progressively  using  track  validation 
criteria.  This  organization  also  makes  it  possible  to 
adopt  the  most  appropriate  dynamic  model  for  the 
identity  processed,  for  each  filter. 

Furthermore,  in  the  particular  case  of  unaligned  sensors, 
similarity  relations  such  as  introduced  in  section  6  can 
be  integrated  in  the  present  processing,  in  order  to 
improve  matching  of  ambiguous  observations.  The 
weights  Qm  are  then  simply  multiplied  by  a 
complementary  factor  Ploo(HmyPlOD(II0),  were  7700(.)  is 
the  plausibility  associated  with  the  mass  function 
defined  at  the  step  5  of  the  process  described  in 
section  6.2.  This  holds  because  77®  and  7/0,  specific 
subsets  of  E,  are  entirely  included  in  eveiy  focal  element 
that  supports  it,  whatever  the  initial  mass  function 
considered  here  may  be. 


7.2  Joint  Tracking  of  Multiple  Targets 

We  now  propose  to  extend  the  single-target  concept 
above  to  the  joint  tracking  of  multiple  targets  whose 
validation  gates  overlap.  The  purpose  is  therefore  to 
develop  a  new  Joint  Multiple  Signal  Filter  (JMSF)  from 
the  Joint  Data  Association  Filter  (JDAF)  of  Y.  BAR 
SHALOM  [16],  using  the  approach  that  allowed  us  to 
establish  the  MSF  from  the  PDAF.  Let  P  be  the  number 
of  tracks  concerned.  All  the  notations  used  up  to  now  are 
conserved,  with  an  added  subscript  p  to  indicate  the 
track  to  which  the  notation  refers. 

The  formulation  sought  can  be  obtained  by  refining  the 
mass  functions  mEp( .),  defined  at  the  level  of  each  track 
(see  figure  1 1),  from  EFp  to  the  cartesian  product 
E?(=EE\x...xEE p,  and  performing  their  orthogonal  sum 
in  this  new  frame  of  discernment.  The  result  should  then 
be  conditioned  and  coarsened  on  the  cartesian  product  of 
the  E®p={pfip, . . .  JjMpp] ,  minus  the  different  target 
position  combinations  in  which  more  than  one  target  is 
located  in  a  same  resolution  cell.  The  mass  function 
obtained  can  then  be  combined  with  the  weighting 
coefficients  amPp  that  would  be  used  in  a  JPDAF 
operating  at  Detection  Probability  =  1,  similar  to  those 
found  in  (Cl)  and  (C2)  of  appendix  C  for  the  PDAF. 
These  coefficients  are  actually  identical  to  a  priori 
probabilities  in  the  frame  of  discernment  considered. 
This  leads  us  to  the  a  posteriori  weighting  coefficients 
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jPnPp  needed  for  updating  the  filters  associated, 
respectively,  with  each  track  p  : 

P>npp  =  D- 1*  I  { amPp*  Plp(HmP pY 

mp  ’€  [0,Mp '] 
p’z{\,P]-{p} 

xm\-t_,^mP 

n  [amp’p-*PlpiHmp’p-)\}  (7.8) 
p’z[\,P]-{p} 

where,  by  convention,  mp- 0  corresponds  to  a  position 
of  target  p  outside  the  gate,  and  where  D  is  the 
normalization  factor  that  guarantees  : 

Mp 

X  PmPp  =  1  (7-9) 

mp= 0 

In  practice,  (7.8)  is  therefore  expressed  : 

pmpp  =  D- 1  *  amPp*  QmPp* 

X  n  amp’p’*Qmp’p>  (7.10) 

mp’<=[0,Mp’]  p’z[\,P]-{p) 
p’e[l,P]-{p} 
xm\^,,.^xmP 

with,  for  each  track  p,  Q®p=\  and  QmPp  is  given  by 
(7.3)  to  (7.7). 

7.3  Illustration 

Two  very  simple  simulations  are  used  to  illustrate  some 
of  the  potential  advantages  of  MSF  [15],  with  reference 
to  the  most  suitable  classical  method.  The  latter  consists 
in  an  usual  PDAF,  associated  with  a  classification  before 
tracking  that  aims  at  declaring  at  first  the  identity  present 
in  each  resolution  cell,  thanks  to  a  Maximum 
Likelihood  criterion.  It  will  be  noted  PDAFC. 

The  only  attribute  considered  in  each  resolution  cell  is 
the  observed  signal  level,  with  this  level  being 
characterized  by  its  a  priori  probability  density  under 
the  various  possible  identity  hypotheses  :  A(0,1)  for  no 
target,  and  N(S,\)  when  one  is  present,  with  5  being  able 
to  take  different  values  depending  on  the  target 
considered.  So,  the  modeling  developed  in  section  2.2.1 
is  used 

The  trajectories  simulated  are  straight  and  level,  at 
constant  speed,  approaching  the  sensors  colocalized  in 
(0,0)  head-on.  The  dynamic  model  used  in  the  filter  is 
the  same  as  the  one  that  generates  the  trajectories.  The 
only  error  introduced  at  the  level  of  the  filter  concerns  the 
track  position  and  velocity  initializations.  The  real 
trajectories  are  in  dotted  lines  and  the  estimated  ones  in 
solid  lines. 


Figure  12.  Tracking  with  a  2D  radar  in  (0,0) 


Figure  13.  Tracking  with  1  pulse  radar  and  1  optronic 
imager  in  (0,0) 


In  figure  12  three  targets  (5=  3,  5=4,  5=6)  are  observed 
by  a  2D  radar  (azimuth  and  range),  and  the  one  we  are 
trying  to  track  is  target  5=4,  which  is,  therefore, 
hemmed  in  between  two  targets :  one  weaker  in  signal 
power  and  the  other  stronger.  Under  these  particularly 
difficult  conditions,  the  MSF  converges  much  faster,  and 
on  the  right  target,  while  the  PDAF^  can  only  lock  onto 
the  more  powerful  one,  hampered  by  the  unavoidable 
limitations  of  his  detection  phase.  This  emphasizes  the 
inability  of  the  PDAFC  to  meet  the  compromise 
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between  convergence  and  rejection  of  spurious  sources  as 
well  as  FMS  does  it. 

In  figure  13  radar  range  measurements  are  associated  with 
azimuth  measurements  from  an  optronic  imager.  Each  of 
the  two  targets  present  is  assumed  to  induce  the  same 
signal  level  from  the  two  sensors,  which  is  respectively 
5=4  and  S=6,  and  the  target  to  be  tracked  is  5=4.  In  this 
context  plot  processing  induces  a  ghost  phenomenon  due 
to  misclassifications  of  target  S=6,  i.e.  association 
ambiguities  between  the  range  measurements  and 
azimuth  measurements,  of  the  kind  that  set  off  persistent 
false  alarms  in  the  PDAFC,  which  therefore  generates  a 
trajectory  lying  along  the  plot  of  barycenters  between  the 
two  real  trajectories.  The  MSF  on  the  other  hand, 
quickly  locks  onto  the  right  target  and  tracks  it  correctly, 
thanks  to  its  ability  to  match  the  measurements  better 
according  to  the  identities  of  the  targets  that  originate 
them,  and  thereby  to  reject  the  incorrect  identities  better. 


8  Conclusion 

The  theory  of  evidence  proves  to  be  an  interesting 
federative  framework  for  multisensor  processing,  as  it 
allows  to  integrate  data  and  information  of  disparate 
nature,  thanks  to  appropriate  modeling  that  has  been 
elaborated.  On  this  basis,  suitable  processings  have  been 
developed  to  achieve  the  main  functions  required  in 
situation  assessment,  such  as  classification,  matching  of 
ambiguous  observations,  and  tracking. 

Their  major  advantage  is  a  better  robustness  in  adverse 
conditions,  thanks  to  their  ability  to  manage  uncertainty, 
unreliability,  and  incomplete  knowledge.  They  also 
allow  to  get  the  best  out  of  the  available  information 
thanks  to  global  functional  approaches  and  centralized 
upstream  data  fusion,  i.e.  where  data  are  the  most 
informative,  what  they  are  able  to  achieve  in  any  case. 

Furthermore,  they  provide  useful  capabilities  for 
multisensor  systems  implementation,  such  as 
management  of  heterogeneous  frames  of  discernment  or 
integration  of  contextual  knowledge,  in  order  to  get  the 
best  out  of  complementary  available  sensors  and  ensure 
suitable  adaptativity. 
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Appendix  A :  Theory  of  Evidence,  Useful 
Basic  Considerations 

The  theory  of  evidence  starts  from  the  definition  of  a 
frame  of  discernment  E  including  1  exclusive  hypotheses 
Hi  (/e  [1,7]).  2^  is  then  the  set  of  the  2-M  subsets  of  E. 

A  few  basic  functions  defined  from  2 &  to  the  interval 
[0,1]  allow  to  characterize  the  likelihood  of  any  subset  tf 
E: 

-  the  mass  function  m(.),  that  represents  the  likelihood 
of  the  singletons  belonging  to  a  subset  assessed,  without 
possible  discernment  between  these  singletons.  It  is  such 
that : 


X  m(A)  =  1 

AQE 

(Al) 

m(0)  =  0 

(A2) 

-  the  credibility  function  Cr(.),  which  may  be 
interpreted  as  a  kind  of  minimal  likelihood  of  a  subset, 
and  which  is  bound  to  the  mass  function  thanks  to  : 

Cr(B)=  X  m(A) 

AQB 

(A3) 

-  the  plausibility  function  Pl(.),  which  may  be 
interpreted  as  a  kind  of  maximal  likelihood  of  a  subset, 
and  which  is  bound  respectively  to  the  mass  function 
and  to  the  credibility  function  thanks  to  : 

P1(B )  =  X  m(A) 

AHB±0 

(A4) 

P1(B)  =  l-Cr(-B) 

(A5) 

-  the  commonality  function  Q(.),  which  is  defined  as  : 

Q(B)  =  I  m(A)  (A6) 

AQE,BQA 


The  focal  elements  of  a  mass  function  m(.)  are  the 
elements  A  of  2%  such  that  m(A)  is  not  null.  When  the 
focal  elements  are  reduced  to  the  singletons  H\  of  the 
frame  of  discernment  E,  then  the  mass  function  m(.)  is  a 
Bayesian  mass  function,  which  is  identical  to  the 
credibility  and  the  plausibility  functions,  all  of  them 
being  identical  to  the  classical  notion  of  probability. 
When  the  focal  elements  are  all  included  in  one  another, 
then  the  credibility  and  plausibility  functions  are 
respectively  reduced  to  the  notions  of  necessity  and 
possibility  used  in  the  possibility  theory. 


Combination  of  distinct  sources 

The  most  legitimate  combination  rule,  according  to  an 
appropriate  axiomatic  description,  is  the  conjunction 
provided  by  the  orthogonal  sum,  which  allows  to 
aggregate  J  distinct  mass  functions  mj(.)  (je  [1,7])  defined 
on  a  same  frame  of  discernment  E  : 

m(.)  =  mi(.)®/w2(.)©...©wX)  (A7) 

and  which  is  such  that : 

J 

m(A)  =  (l-K)-'*  X  FI  nij(Aj)  (A8) 

A  i  fl . . .  DA  j=A*0  j- 1 

where  Aj  stands  for  any  focal  element  of  mj{),  and  K  is 
the  inconsistency  of  the  fusion,  i.e.  the  degree  of  conflict 
between  the  different  mass  functions : 

J 

K  =  X  fl  mj(Aj)  (A9) 

A\D...nAj=0  y-l 

This  rule  can  be  expressed  very  simply  on  the  basis  of 
the  respective  commonality  functions  : 

J 

(A10) 

7-1 

Thanks  to  the  definitions  provided  above,  (A  10)  can  be 
applied  to  the  plausibilities  of  singletons  Hi . 

Nevertheless,  the  implementation  of  the  orthogonal  sum 
is  meaningful  only  if  the  inconsistency  is  not  too 
important,  i.e.  if  their  is  no  major  conflict  between  the 
different  sources.  Such  a  conflict  may  arise  in  different 
circumstances,  for  instance  when  sources  are  not  reliable, 
when  the  frame  of  discernment  is  not  exhaustive,  or 
when  sources  are  not  assessing  the  same  object  because 
of  spatially  ambiguous  observations.  According  to  the 
problem  encountered,  different  attitudes  are  possible. 
First,  instead  of  using  the  inconsistency  for 
normalization,  it  may  be  assigned  to  the  empty  set,  to 
the  whole  frame  of  discernment,  or  to  a  further  singleton 
Hj+ 1  added  to  E  with  deconditioning  (see  below)  on  the 
new  frame.  Another  way  consists  in  implementing  a 
disjunction  : 

J 

m(A)  —  X  FI  f»j(Aj)  (All) 

A\U...  UAj=A  7=1 

or  an  adaptative  law  between  conjunction  and 
disjunction  : 
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J 

m(A )  =  X  n  mj(Aj)  + 

A\D...nAj=A  7=1 

J 

2  n  mj(Aj)  (A  12) 

AiU...UAj=A  7=1 
Ain...tlAj=0 

Different  convex  combinations  are  also  possible,  either 
between  conjunction  and  disjunction,  or  directly  between 
source  assessments. 

All  these  laws  are  generally  commutative  (except  the 
convex  combinations),  but  several  are  not  associative 
(nonetheless  the  orthogonal  sum  is  associative). 
Furthermore,  when  applied  to  probabilities  or 
possibilities,  which  are  particular  cases  of  plausibility 
functions,  they  generally  do  not  provide  outputs  of  the 
same  nature  (except  the  orthogonal  sum  applied  to 
probabilities). 

Reliability  management 

The  lack  of  reliability  of  a  source  is  managed  by 
discounting  its  mass  function  m(.)  at  level  d.  This 
operation  provides  an  updated  mass  function  md{fi  such 
that : 

md(A)  =  (l-d)*m(A)  ,  MA^E,A*E  (A  13) 

md{E)  =  d\(\-d)*m{E)  (A14) 

Management  of  frames  of  discernment 

Four  basic  operations  allow  to  manage  both  granularity 
(refining  /  coarsening)  and  exhaustivity  (conditioning  / 
deconditioning)  of  frames  of  discernment : 

(i)  A  refining  R  associates  to  each  hypothesis  Hfi  of  a 
frame  of  discernment  E^={H\^,...,Hj\  1 }  a  subset 
/?(///!)  of  another  frame  e2={H\2,...,He>2},  such  that 

1)}  is  a  partition  of  £2.  So  a  mass 
function  defined  on  £l  provides  a  mass  function 
m\)  on  £2  thanks  to  an  operation  of  minimal 
extension  : 

m\R(A))  =  m\A)  ,  VfC£l  (A15) 

(ii)  A  coarsening  C  is  the  inverse  operation  of  a  refining 
R  (C=£'l),  so  that  a  mass  function  (.)  on  £*  is 
obtained  from  a  mass  function  /w2(.)  on  £2  thanks  to  : 

tkI(A)  =  2  m2(B)  (A  16) 

BOE2 

A={Hil/R(Hj')nB*0} 


(iii)  Conditioning  allows  to  specify  an  assessment  by  the 
introduction  of  a  further  information.  More  precisely,  the 
certainty  of  a  proposition  A  is  expressed  by  a  mass 
function  mjQ  such  that : 

mj(A)  =  1  (A  17) 

and  the  conditioning  of  an  initial  mass  function  m(.) 
according  to  this  proposition  consists  in  combining  it 
with  mj(.)  : 

m(./A)  =  m(.)£B  mj(.)  (A18) 

(iiii)  Deconditioning  consists  in  obtaining,  from  a  mass 
function  m\{)  defined  on  E\,  a  mass  function  m(.)  on  a 

frame  of  discernment  E  broader  than  E\  (£jC£).  m(.) 
must  be  such  that  its  conditioning  on  E\  is  exactly 
m\{.).  Among  all  the  mass  functions  that  satisfy  this 
condition,  m(.)  is  selected  according  to  the  principle  cf 
minimum  assignment : 

m(AU(E-E\))  =  m\(A)  ,  \/ACE\  (A19) 


Appendix  B  :  Statistical  Learning  Modeling 

The  search  for  all  the  models  satisfying  the  three  axioms 
proposed  in  section  2.2.1  is  led  by  progressively 
restricting  the  set  of  possible  models. 

Axiom  3  :  Consistency  With  the  Bayesian  Approach 
Development 

Let  m(){)  be  the  mass  function  representative  cf 
information  source  So  consisting  of  the  a  priori 
probabilities  P(Hj).  mofi)  is  then  a  Bayesian  mass 
function  defined  by : 

m0m- Pm,  V  m  [1,7]  (Bl) 

mo(A)  =  0,  V  A&Hi ,  /e  [1,7]  (B2) 

The  desired  consistency  requires  that  the  orthogonal  sum 
of  the  mass  functions  mj(.)  and  mo(.)  produces  a 
Bayesian  mass  function  m£,(.)  in  conformity  with  the 
Bayesian  inference  (2.10)  whenever  the  distributions 
p(sj/Hj )  are  perfectly  representative  of  the  densities 
actually  encountered,  and  thus  whenever  qij=  1  for  any  i 
and  j.  This  axiom  should,  in  particular,  remain  true  for 
any  subset  of  combined  sources  Sj  delimited  by 

jeJ’C[l,J],  Concretely  : 

mbQ  =  (  ®  mj <•)  }  ®  moQ 
jeJ’ 

should  under  these  conditions  therefore  verify  : 
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mbm=  {[Up(sj/Hi)]*P{Hi)}  l^{[Y\p{sjIHk)YfP(Hk)}  , 
j  k  J 

V  HjeE  (B4) 

Moreover,  equations  (Bl),  (B2),  and  (B3)  lead  to  : 

mb(Hi)  =  {[IlPlj(Hi)]*Pm}  /  I  {[UPlj(.Hk)YP(Hk)}  , 
j  k  J 

V  77/e  E  (B5) 

By  satisfying  (B4)  and  (B5)  jointly  for  any  we 

lastly  define  each  mj{.)  by  its  plausibility  function  using 
the  7  equations : 

77/77,)  =  Kfp{sjlHi)  ,  ie  [1,7]  (B6) 

in  which  Kj  is  a  unique  parameter  for  the  7  equations, 
defined  simply  by : 

Kj  e  [{I p(sj/Hi)}-l,  {ma.x[p(sj/Hi)] } - 1  ]  (B7) 

i  i 

These  bounds  on  Kj  are  required  only  by  the  intrinsic 
nature  of  the  idea  of  plausibility,  which  has  to  remain 
less  than  unity,  while  the  sum  of  the  values  it  takes  for 
events  constituting  a  partition  of  E  (the  Hj  themselves, 
here)  must  be  greater  than  unity. 

Comments 

The  conclusion  thus  drawn  from  Axiom  3  calls  for  a  few 
comments.  Firstly,  in  the  general  case  where  7>2  ,  for 
each  value  of  Kj  other  than  the  minimum  required  by 
(B7),  there  exists  an  infinite  number  of  possible  mass 
functions,  defined  by  a  system  of  7+1  equations  (7 
equations  (B6)  and  the  sum  of  the  masses  equal  to  1) 
with  2-M  unknowns. 

For  the  minimum  value  of  Kj,  the  result  obtained  always 
amounts  to  a  unique,  and  moreover  Bayesian,  mass 
function : 

mj(Hj)  =  p (sj/Hf)  /  lp(sj/Hk) ,  V  ie  [1,7]  (B8) 

k 

mj{A)  =  0  ,  V  A*Hi ,  ie  [  1 ,7]  (B9) 

Of  the  various  solutions  obtained  for  the  maximum  value 
of  Kj,  there  exists  a  consonant  solution,  unique  on  the 
set  of  solutions  found,  that  corresponds  to  the  model 
proposed  by  G.  SHAFER  on  the  basis  of  this 
characteristic  alone,  for  a  context  similar  to  that  of  the 
present  Axiom  3  [1].  To  give  a  practical  expression  to 
this  solution,  let  us  suppose  that  the  p(sj/Hj)  are 
arranged  such  that  p(sj/H\)>p{sj/H2)>.  .  .>p(sj/Hf).  The 
focal  elements  are  the  7  subsets  of  E  : 

Aj  =  U  Hk ,  ie  [  1 ,7]  (BIO) 

k<i 


and  the  corresponding  masses  are  given  by  : 

mj{Aj)  =  Kj*p(sj/Hf)  (B 11) 

mj(Aj)  =  Kj*  {p(sj/Hi)-p(sj/Hj+\)},  pour  l</<7-l  (B12) 

It  should  nonetheless  be  pointed  out  that  this  last 
solution  does  not  satisfy  Axioms  4  and  5,  and  that  it 
therefore  cannot  be  retained  in  the  following. 

Let  us  lastly  say  that,  in  the  ideal  case  where  the 
distributions  p(sj/Hj )  are  perfectly  representative  of  the 
densities  actually  encountered,  a  maximum  likelihood 
procedure  requires  retaining  the  hypothesis  77/  that  will 
maximize  p(s\,...,sj/Hi).  Yet  since  the  hypotheses  77/ 
are  singletons  of  the  frame  of  discernment  E,  and 
p(s\,...,sj/H{)  is  the  product  of  the  pisj/Hi)  provided  by 
the  7  independent  sources  Sj,  the  plausibility  77(77/) 
obtained  after  associating  the  sources  Sj  is  expressed, 
using  (B6),  by : 

7/(77/)  =  Kfp(sh... ,sjlHi ) ,  V  ie  [1,7]  (B 1 3) 

in  which  the  coefficient  Kf,  independent  of  77/,  integrates 
the  Kj  terms  and  the  inconsistency  of  the  combination. 
To  remain  consistent  with  this  particular  case,  any 
decision  procedure  to  designate  the  most  realistic 
hypothesis  must,  for  our  problem,  exclusively  maximize 
a  monotonic  increasing  function  of  the  plausibility 
77(77/)  alone,  obtained  after  combining  the  sources  Sj. 

Axiom  4  :  Separability  of  Hypothesis  Evaluations 

This  axiom  consists  in  considering  that  each  mass 
function  mj(.)  sought  is  itself  the  result  of  a  combination 
between  7  mass  functions  mjj(.)  (/e  [1,7]) : 

mj(.)=  ®  mjj(.)  (B14) 

i 

A  mass  function  mij(.)  also  has  three  focal  elements  (Hj, 
-i Hi,  and  E),  whose  masses  depend  only  on  the  value 
p(sj/Hj)  and  the  corresponding  factor  qjj. 

Since  the  hypotheses  77/  are  the  singletons  of  the  frame  of 
discernment  E,  the  plausibility  77/(77/)  is  proportional  to 
the  product  on  k  of  the  Plkj(Hi)  associated  with  the 
m/g(.).  After  factorization  of  the  product  on  k  of  all  the 
Pl/gi^Hk),  it  is  finally  expressed  : 

77/(77/)  =  Kfj*  {mij(Hi)+m  ij(E)  }!{m  //fy/7 i)+my(E) } , 

ie[\,T]  (B15) 

in  which  the  factor  Kfj  is  independent  of  the  hypothesis 
77/  concerned. 

Holding  to  constraint  (A6)  as  required  by  Axiom  3  will 
then  permit  the  probability  p(sj/Hi)  alone  to  be 
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associated  with  mass  function  Wy(.)  alone,  for  qjj=  1, 
only  if : 

{mij(Hj)+mij(E)}/{mij(--Hi)+mjj(E)}  =  Rj*p(sj/Hj ) 

016) 

in  which  is  a  normalization  constant  independent  of 
//,  whose  possible  values  depend  only  on  the 
distributions  p(sj/Hj )  actually  taken  into  account,  as  we 
shall  see  in  the  following.  In  practice,  this  constant 
allows  us  to  consider  the  general  framework  where  the 
p(sj/H[)  are  known  only  relatively,  i.e.  to  within  a 
normalization  gain. 

Expressed  parametrically  as  a  fimcion  of  the  level  of 
uncertainty  (B16)  procures  the  desired  mass 

function : 

mij(Hi)  =  {Rj*p(sjlHi)-mij{E)}l  { l+Rj*p(sj/Hj)}  (B 17) 
mi/rHO  =  { l-Rj*p(sj/Hi)*mjj(E)}/ {\+Rj'*p(sj/Hf)} 

(B 18) 

mij(E)=ARfp(sj/Hi)\  e  [0,  Rj*p(sj/Hj)\  (B19) 

in  which /is  any  function  verifying  simply  (B19). 

This  condition  (B19)  is  required  by  the  mass  idea 
(included  between  0  and  1),  which  also  limits  the 
possible  values  of  Rj  as  a  function  of  the  distributions 
p(sj/Hj)  used,  and  does  so  independently  of  the  measures 
sj  actually  observed : 

Rj  e  [0,  (max {p(sj/Hj)}y  1  ]  (B20) 

Sj,/ 

It  is  furthermore  possible  to  show  that  these  conditions 
are  sufficient  in  order  for  the  coefficient  Kj  in  expression 
(B6),  calculated  for  the  combination  (B14),  to  verify  the 
constraint  (B7).  This  can  be  done  simply  by  showing 
that  the  expression  for  Kj  is  then  an  increasing 
monotonic  function  of  each  whose  extreme  values 

make  it  possible  to  satisfy  the  interval  (B7). 

If  we  introduce  the  factor  qij  into  the  expressions  (B17), 
(B18),  and  (B 19)  in  terms  of  discounting,  the  mjj(.)  are 
finally  given  by  : 

mij(Hj)  =  qij*{Rj*p(sj/Hi)-Ai}/{l+Rj*p(sj/Hi)}  (B21) 
mifrHd  =  qij*  {l-Rj*p(sj/H{)*Ai}/ { \+Rfp{sjlH{)) 

(B22) 

mij(E)  =  1  -qij+qij*Ai  (B23) 

in  which  Rj  is  still  defined  by  (B20),  and  Aj  by  : 

Ai=J[Rj*p(sj/Hj)]  e  [0,  Rj*p(sj/Hj)]  (B24) 

The  general  expression  of  the  models  mj(.)  that  satisfy 
Axioms  3  and  4  is  thus  found  by  (B14)  applied  to 


(B21),  (B22),  and  (B23).  An  infinite  number  of 
solutions  thus  still  fit  our  problem. 


Axiom  5 :  Consistency  With  the  Probabilistic 
Association  of  the  Sources 

Considering  the  special  structure  (B14)  of  the  mass 
functions  rtij(.)  complying  with  Axioms  3  and  4,  and  the 
associativity  of  the  orthogonal  sum,  Axiom  5  will  be 
satisfied  for  models  such  that,  if  the  qij  are  equal  to  1 , 
the  mass  function  w/(.)  defined  by  : 


«;(.)  =  ©  myi.)  (B25) 

mlj(.)  =  F[Rj*p(sj/Hi)]  (B26) 

is  identical  to  the  mass  set  m '/(.)  obtained  by  direct 
modeling,  using  the  same  function  F(.)  : 

m'l)  =  miRfjtytHm  (B27) 

j 

The  mij(.)  verifying  (B17),  (B18),  (B19),  and  (B20),  in 
the  combination  (B25),  yield  : 

mj(Hi)  =  (V*X-Y*W)/(V*X+X-Y*  W)  (B28) 

mjrHi)  =  (X-Y*  W)/(  V*X+X-Y*  W)  (B29) 

mi(E)  =  Y*  W/(  V*X+X-Y*  W)  (B30) 

with  the  definitions  : 

V=n  {Rj*p(sj/Hi)}  (B31) 

W=Tl{l+Rj*p(sj/Hi)}  (B32) 

X=U{l+mij(E)}  (B33) 

Y=Ylmij(E)  ‘  (B34) 

j 

and  the  constraints  : 

mij(E)  =J[Rj*P(sj/Hi)]  e  [0,  Rj*p(sj/Hj)\  (B35) 

Rj  e  [0,  (max {p(sy/tf /)})-!]  (B36) 

sjJ 

At  the  same  time,  the  mass  set  is  written  : 

=  {V-mj(E)}!{\+n  (B37) 

m'it'Hi)  =  {\-V*m'i(E)}l{\+V}  (B38) 

mj(E)  =ATl{Rj*p(sj/Hi)}]  e  [0,n {Rj*p(sj/Hj)}] 

j  j  (B39) 


in  which  V  is  still  given  by  (B31),  and  the  Rj  are  also 
constrained  by  (B36). 


5-29 


We  can  go  about  comparing  the  mass  sets  m^.)  and 
by  letting  mi(E)=m'j(E)  in  (B30).  Then 
expressions  (B37)  and  (B38)  are  equivalent  to 
expressions  (B28)  and  (B29),  respectively,  which  means 
that  under  all  circumstances  and 

On  the  other  hand,  (B35)  and  (B39) 
will  be  equivalent  for  the  same  function/  through  (B30) 
still  under  the  constraint  mi(E)=m'i(E),  only  for  the 
following  two  functions  f : 


X*)  =  0,  Vr 

Ax)  =  x 


(B40) 

(B41) 


After  examination  of  Axiom  5,  only  two  models  are  left 
that  simultaneously  satisfy  the  three  axioms.  Both  are 
defined  by  (B21),  (B22),  (B23).  They  differ  by  the  fact 
that  Aj=0  for  one  while  A  i=Rj*p(sj/Hj)  for  the  other,  the 
Rj  being  constrained  by  (B20)  in  both  cases. 


cfi  =  -Pg)/Cr  (Cl) 

0™  =  exp[-  0.5*  (xm-x/c)2'  V/f  1  (xm-xjc)]  (C2) 

in  which  :  Cr  =  nr^2/T(l+r/2)  (C3) 

x/c  and  Vk  designate  the  predicted  position  and  its 
covariance,  at  time  k.  r  is  the  common  dimension  of  xk 

and  xm.  Pg  represents  the  a  priori  probability  that  the 
target  is  in  the  validation  gate,  considering  the  choice  cf 
statistical  gating  threshold  y. 

The  fin  coefficients  are  determined  from  the  am 
coefficients  by  (7.1)  to  (7.7).  The  estimated  state  Xk/k 
and  its  covariance  Pk/h  which  are  outputs  of  the 
procedure,  are  then  updated  at  time  k  by  : 

Xklk=Xk/k-\+Gk*zk  (C4) 

P]dk  =  ($*PkJk-l+V-ftrQ-Gk*m*PkJk-\+Pk  (C5) 


Summary  of  the  Models  Obtained 

There  are  finally  only  two  models,  then,  that  jointly 
satisfy  the  three  desired  axioms.  Both  meet  the 
decomposition  : 


mj(.)  -  ©  (B42) 

i 

Model  1  is  particularized  by  : 

mij(Hj)  =  0  (B43) 

miji-Hi)  =  qf  { 1  -RfpisjlHi)}  (B44) 

mij(E)  =  1  -qij+qifRfp(sj/Hj)  (B45) 

and  Model  2  by  : 

mij(Hj)  =  qij*Rfp(sj/Hi)/{  1  +R/p(sj/Hi)}  (B46) 

mijirHi)  =  qij/{\+Rj*p(sj/Hi)}  (B47) 

mij{E)  =  \-qij  (B48) 


In  both  cases,  the  normalization  factor  Rj  is  simply 
constrained  by : 


in  which : 

Pk  =  Gk*[(  I  ^*z^*z^T)-zA*zytT]*G^T 

(C6) 

m*  0 

Zjjn=xm-x k 

(C7) 

zk=l  (pn*zijn) 

(C8) 

m*  0 

Gk  =  P/dk-\*HT*Vk-' 

(C9) 

H  is  the  position  observation  matrix. 

Prediction 

The  predicted  state,  Xk/k- 1,  and  its  covariance,  Pklk-b 
used  above  for  updating  the  filter,  are  calculated  from  the 
state  Xk-\lk-\  and  its  covariance  Pk-l/k-l  estimated  at 
the  time  of  previous  observation  k-l  by  the  filtering 
module : 

XkJk-l=R*Xk-l/k-l  (C1°) 

Pk/k-l=F*Pk-l/k-\*FT+Q  (C11> 


Rj  e  [0,  (max^sj////)})'1] 

Sj,i 


(B49)  in  which  F  is  the  state  transition  matrix  from  one 
observation  time  to  the  next  one,  and  Q  the  noise 
covariance  matrix  on  the  state. 


Appendix  C  :  Filter  Expression 

The  filtering,  prediction,  and  statistical  gating  modules 
are  those  of  a  PDAF  that  would  operate  at  minimum 
threshold  with  Pd=Pfa=l  . 

Filtering 

The  am  coefficients  are  given  by  : 


The  predicted  position  measurement  xk  and  its 
covariance  Vk,  used  by  the  filtering  and  gating  modules, 
are  then  determined  by  : 

xk  =  H*Xk/k- 1  (C12) 

Vk  =  H*Pklk- 1  *HT+R  (c  13) 

with  R  designating  the  noise  covariance  matrix  on  the 
position  measurement. 
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Statistical  Gating 

The  cells  x^n  and  xm  to  be  processed  (figure  11)  are 
selected  by  the  tests  : 

(xln.xk)T*Vk-]*(xln-xic)  >  y  (C14) 

(xm-x/c)^* V]c 1  *(xm-xk)  >  y  (Cl 5) 

Appendix  D :  Combination  Process  for 
Tracking 

The  elaboration  of  the  combination  processing  suitable 
for  the  tracking  problem  tackled  in  section  7  is 
developed  according  to  the  scheme  provided  in  figure  11. 

Procedure  at  the  Level  of  Each  Resolution  Cell 

The  first  step  consists  in  associating  the  mass  functions 
mjjln(.)  defined  on  a  same  frame  of  discernment 
E}n={H}n, as  regards  the  various  sensors  j  of  a 
given  alignment  group  /.  For  each  E/n,  their  orthogonal 
sum  directly  yields  the  mass  function  m/%)  defined 
by  : 

J 

m/n(Hiln)  =  {  n  [mijHHiln)+mi/”(EiIn)]- 
7=1 

J 

n  mijHE^KX-K^n)  (Dl) 
7=1 

J 

m}n(rH}n)  =  {  n  [mifn(-Hiln)+mijln(Eiln)\- 

7=1 

J 

nmi/n(Eiln)W-Kiln)  (D2) 
7=1 

J 

mMEiln)  =  {  n  mi/n(EiIn)W-Kil”)  (D3) 

7=1 

in  which  K^n  represents  the  inconsistency  of  the 
combination,  the  expression  of  which  is  not  necessary  for 
the  remaining  discussion. 

The  refinement  of  E}n  in  the  set  E^n={H\^n,...,H/n} 

of  the  identities  attached  to  the  cell  x^n  allows  the 
minimum  extension,  in  the  common  frame  cf 
discernment  E^n,  of  the  m}n (.)  relative  to  these  different 
identities.  The  orthogonal  sum  of  the  mass  functions 
obtained  leads  to  the  mass  function  Then  all  we 

have  to  do  is  express  the  plausibilities  of  77/w  and  of 


which  are  all  that  is  needed  for  the  rest  of  the 
discussion  : 

Plln^Hjln)  =  ( 1  -KlnY  1  *  \m  j^(Ejln)]* 

7-1 

II  [mJfHrHjlny+mfr(EjIn)\  (D4) 
»=1 

P/l«(-77/«)  =  ( 1  -Kln)~ 1  *  [m  iln(Ejln)\  * 

7-1  7-1 

[  El  [mM-Hity+mMEM-  FI  m/«(-7//«)+ 

i=  1  /=  1 

7-1  7- 1 

I  {mMHjlny  n  [mjJn^Hi'l^+mi’lniEi’1^]}] 
i=l  f=  1  (D5) 

i’&i 

in  which  K^n  is  the  combination  inconsistency,  the 
expression  of  which  is  not  usefull  for  the  following. 

Summary  at  the  Level  of  the  Validation  Gate 

So  two  cases  should  be  distinguished,  depending  on 
whether  the  x^n  resolve  the  validation  gate  or  not  (see 
introduction  of  section  7).  If  they  do,  then  the  E^n 
relative  to  each  x^n  cell  need  only  be  refined  in  the  set 
EI=E^x...xEIN  of  possible  identity  distributions  on  the 
cells  in  question;  and  the  orthogonal  sum  of  the 
resulting  minimum  extensions  can  be  performed.  The 
very  special  nature  of  the  associated  focal  elements,  each 
being  specific  to  a  distinct  component  of  the  cartesian 
product,  allows  a  relatively  simple  expression  for  the 
only  plausibilities  we  now  have  to  evaluate  on  the  basis 
of  (D4)  and  (D5).  These  plausibilities  concern  the  N 
hypotheses  H^n  of  the  presence  of  identity  77/ 
respectively  in  the  cell  x^n,  to  the  exclusion  of  any  other 
cell,  and  the  hypothesis  77°  of  the  absence  of  identity  77/ 
in  the  gate.  These  hypotheses  are  in  fact  specific  subsets 
of  El,  as  there  exists  one  and  only  one  target  of  identity 
77/  in  the  gate,  according  to  the  axiom  adopted  to  start 
with  (see  introduction  of  section  7).  This  leads  to  : 

N 

MttO)  =  n  PllnirH]ln)  (D6) 

n=  1 

N 

Pll(Hln)  =  PlHHllnr  n  Plln  '{-Hjln  )  (D7) 

n’=  1 
n  't-n 

If  the  x^n  do  not  resolve  the  validation  gate,  an 
additional  prior  refinement  should  be  performed  from 
each  Eln  to  a  set  £  '**=  {77j  //i  /« 1 , . . .  ,/7/«0,7//« 1 } 
to  split  each  hypothesis  H^n  between,  on  the  one  hand, 
a  similar  hypothesis  77/^1  simply  relative  to  the  part  of 
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x^n  covering  the  gate,  and,  on  the  other  hand,  an 
additional  hypothesis  relative  to  the  part  of  x^n 

outside  the  gate.  The  operations  conducted  in  the  case 
where  the  gate  is  resolved  are  then  conducted  on  the 
modified  set  El=E’Hx...xE'M.  However,  hypothesis 
///«  is  now  reduced  to  the  presence  of  the  identity  Hj 

just  in  that  part  of  cell  xln  covering  the  gate,  and 
excluding  any  other  cell  defined  in  the  gate.  Hypothesis 
#0  though  its  definition  remains  unchanged,  also 
corresponds  to  a  different  subset  of  EK  The  result  is  the 
modified  expressions : 

=  1  (D8) 

Pl\Hln)  =  piln(Hjln)  (D9) 

The  groups  /  of  unaligned  sensors  are  then  combined  in 
both  cases  by  refining  the  E l  in  the  common  set 
Ef=E^x...xEL,  and  performing  the  orthogonal  sum  cf 
the  associated  minimal  extensions.  This  step  reflects  the 
fact  that  we  are  interested  in  the  intersections  xm  of  the 
cells  xln  (see  section  7.1).  As  each  of  the  associated  focal 
elements  here  is  still  specific  to  a  distinct  component  cf 
the  cartesian  product,  the  plausibilities  of  H 0  and  of  the 
hypotheses  TP”  of  presence  of  identity  Hj  in  cell  xm,  to 
the  exclusion  of  any  other  cell,  are  expressed  for  the 
resulting  mass  function  mF(.)  by  : 


L 


PIf(H0)  =  n  Pll(H°) 

(DIO) 

hi 

L 

piF{Hm) = n  pil{Hln) 

(Dll) 

h  1 

xtnoxln 


Determination  of  Weightings  /P1  and  Practical 
Expression  for  the  Filter 

The  last  step  consists  in  combining  this  result  with  the 
a  priori  probabilities  am,  which  constitute  a  Bayesian 
mass  function  on  E^={H^,H^,...,H^}-  As  E®  is  a 
partition  of  a  subset  of  EF,  conditioning  and  coarsening 
mF{ .)  from  EF  to  Ffi  makes  it  possible  to  take  the 
orthogonal  sum  with  the  set  of  the  am.  The  resulting 
mass  function,  which  is  Bayesian  over  £°,  is  directly  the 
set  of  probabilities  fP  we  are  looking  for  : 

M 

flO  =  cfi * PlF(pfi)l { cfi * PlF(I-fi)+ 1  am*PlF(IPn)} 

m=  1  (D12) 


M 

IP  =  am*PlF(Hm)/{cfi*PlF(Efi)+  1  am  ’*piF(Hm  ’)} 

m  ’=1  (D13) 


Expressions  (Dl)  to  (D13)  can  then  be  summarized  by  : 


M 

pO  =  cfl/{aP+l  a,m*Qm} 
m=  1 


(D14) 


M 

=  am*Qm/{cfi+  £  am’*Qm’} 
m’~  1 


(D15) 


in  which 


Qm  =  PlF(Hm)/PlF(Pfi)  =  ]1  Qln 
1=  1 

xmc.xln 


(D16) 


with,  for  the  sensor  groups  /  that  resolve  the  gate  : 

J  1-1  J  I-\  J  J 

Qln  =  UAIj^/{l-mi-UBi/n)+ 1  {YlAi^-YlByln)} 

j=  1  i=l  7=1  i-l  7=1  7=1  (Dl7) 

and,  for  the  sensor  groups  /  that  do  not  resolve  the  gate  : 
J  I  J  J  J  J 

Qin = n^//«/{i-n(i-n%/«)+  x  (nV«-nV«)} 

7=1  i-l  7=1  i=l7=l  7=1  (D18) 

In  both  cases,  the  coefficients  Ai/n  and  Byln  represent, 
respectively,  the  expressions : 

Ai/n  =  {m/n(Hjln)+m  /n(E/n) }  / 

{m/n(-Hiln)+m/n{Eiln)}  (D19) 

B/n  =  m/n{Eiln)t{m/n{-Hilnym/n{E^  (D20) 
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Abstract 

Three  PDE  image  processing  methods  are  dis¬ 
cussed:  Mumford-Shah  variational  methods,  peer 
group  averaging  (PGA),  and  Osher-Rudin  shock  fil¬ 
tering.  Each  of  these  methods  is  used  in  segment¬ 
ing  images  into  homogeneous  regions  separated  by 
distinct  boundaries;  reducing  the  image  to  regions 
and  boundaries  extracts  the  image  structure  in  a  way 
that  can  be  interpreted  automatically  by  parsing  al¬ 
gorithms. 

Mumford-Shah  algorithms  approach  the  problem 
of  segmentation  as  one  of  approximation.  The  ap¬ 
proximation  to  the  image  is  represented  by  the  ho¬ 
mogenized  regions  and  their  boundaries.  The  best 
approximation  is  found  by  minimizing  an  objective 
function  that  controls  1)  degree  of  approximation,  2) 
smoothness  within  regions  and  3)  extent  of  region 
boundaries.  These  three  objectives  are  controlled  by 
weight  parameters;  choosing  these  parameters  cor¬ 
rectly  is  a  problem  of  major  concern  for  Mumford- 
Shah  algorithms.  Once  the  parameters  are  selected 
the  objective  function  can  be  minimized  via  an  en¬ 
ergy  descent  method  resulting  in  a  nonlinear  PDE 
with  the  original  image  as  initial  data.  Evolution 
under  the  PDE  produces  the  desired  approximation 
and  image  segmentation.  Recent  work  by  Hewer  et 
al.  has  implemented  this  procedure  in  a  manner 
that  minimizes  the  parameter  selection  problem  and 
greatly  reduces  the  number  of  descent  steps  needed 
for  an  acceptable  approximation.  This  reduction  is 
due  in  part  to  the  use  of  PGA  as  a  preprocessing 
step:  the  initial  image  data  is  replaced  by  the  PGA- 
filtered  image.  Since  PGA  produces  results  that  are 
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near  the  desired  Mumford-Shah  approximation  only 
a  few  descent  steps  are  required. 

Peer  group  averaging  is  a  discrete  approximation 
method  that  starts  with  the  initial  image  and  then 
makes  processing  decisions  based  on  the  local  peer 
group.  This  peer  group  is  determined  by  nearness 
in  intensity  value:  the  gray  level  of  the  central  pixel 
of  a  local  window  is  compared  with  the  other  pix¬ 
els  in  the  window  and  the  closest  ones  form  the  peer 
group.  The  average  over  the  peer  group  is  then  used 
to  replace  the  central  pixel  intensity  value.  This  pro¬ 
cedure  converges  quickly;  usually  only  two  or  three 
iterations  are  needed.  In  this  method  the  parame¬ 
ters  consist  only  of  the  window  size  and  the  number 
of  pixels  in  the  peer  group.  These  parameters  are 
easily  selected  for  enhancing  specific  targets. 

PGA  is  closely  related  to  the  shock  filtering 
method  of  Osher  and  Rudin.  In  shock  filtering  im¬ 
age  information  moves  outward  from  the  centers  of 
regions.  This  outward  motion  forms  standing  shock 
fronts  at  the  boundaries  of  regions.  The  convec¬ 
tive  PDE  describing  this  evolution  arises  naturally 
in  many  situations  including  flame  front  propagation 
and  crystal  growth.  Shock  filtering  preserves  edge 
location  (unlike  many  noise  reduction  methods  such 
as  Gaussian  smoothing)  and  at  the  same  time  main¬ 
tains  the  total  variation  of  the  image.  This  means 
that  it  removes  noise  while  enhancing  the  contrast 
across  edges.  This  effect  is  desirable  in  ATR  appli¬ 
cations.  PGA  and  shock  filtering  have  been  shown 
to  be  equivalent  for  Id  signal  problems. 

Applications  are  presented  to  illustrate  the 
Mumford-Shah,  shock  filtering  and  PGA  image  pro¬ 
cessing  methods. 


Paper  presented  at  the  RTO  SCI  Lecture  Series  on  "Application  of  Mathematical  Signal  Processing  Techniques 
to  Mission  Systems”,  held  in  Koln,  Germany,  1-2  November  1999;  Paris,  France,  4-5  November  1999; 
Monterey,  USA,  9-10  November  1999,  and  published  in  RTO  EN-7. 
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1  Introduction 

In  the  last  decade,  a  new  image  processing  tool  has 
been  developed  that  uses  partial  differential  equa¬ 
tions  to  generate  scale  space  decompositions  of  an 
image.  Such  decompositions  are  analogous  to  the 
multiresolution  decompositions  provided  by  wavelets 
or  Fourier  transforms  except  that  the  scale  parame¬ 
ter  for  PDE  processing  is  the  time  evolution  under 
the  PDE  with  the  original  image  as  the  initial  data. 

Variational  scale  space  image  decompositions  are 
described  in  Section  2.  This  approach  utilizes  an  ob¬ 
jective  function  E  =  E(g,u,B )  that  depends  on  the 
original  image  g  as  well  as  an  approximation  u  and  a 
boundary  function  B.  Typically  E  contains  a  penalty 
term  that  measures  the  difference  between  g  and  u, 
another  penalty  term  for  the  nonsmoothness  in  u  and 
also  a  penalty  term  for  the  length  of  the  boundaries  of 
the  regions  in  the  image.  This  latter  term  is  needed 
to  control  the  number  of  region  components  in  the 
final  segmentation:  too  many  components  and  the 
result  is  not  useful. 

A  variety  of  methods  have  been  developed  to  im¬ 
plement  this  approach.  These  include  region  merg¬ 
ing  schemes  used  by  Koepfler  and  others  [16],  [17], 
applying  homotopy  type  methods  to  the  objective 
function  in  order  to  guarantee  convergence  of  descent 
methods  to  global  minimizers  such  as  the  GNC  ap¬ 
proach  of  Blake  and  Zisserman  [4],  and  using  the 
Euler-Lagrange  PDE  associated  with  the  objective 
function  with  the  boundary  B  intepreted  as  a  con¬ 
tinuous  function  rather  than  a  binary  process.  For 
general  references  to  variational  methods  and  PDEs 
related  to  image  processing  see  [21]  and  [29]. 

Each  of  these  approaches  has  advantages  and  dis¬ 
advantages.  For  example,  region  merging  generally 
produces  excellent  results  and  is  easily  adapted  to 
a  multichannel  form  that  can  accept  multiresolution 
or  multispectral  image  data  as  input;  however  region 
merging  is  computationally  intensive  and  may  not 
be  appropriate  for  real-time  applications.  Time  con¬ 
siderations  also  place  limitations  on  the  number  of 
iterations  that  can  be  used  in  steepest  descent  pro¬ 
cedures  for  other  variational  methods.  This  means 
that  we  must  usually  forego  the  luxury  of  finding  the 
global  minimizer  of  the  objective  functional  and  in¬ 
stead  seek  an  approximation  that  is  acceptable  rather 
than  optimal. 

One  way  of  handling  this  problem  is  to  modify 
the  image  prior  to  applying  the  PDE.  For  example 
if  we  desire  a  very  uniform  approximation  with  few 
regions  then  Gaussian  smoothing  provides  a  fast  pre¬ 
filter.  Unfortunately  this  type  of  smoothing  degrades 
edges  and  can  shift  their  position;  this  means  that 
the  prefiltered  image  is  somewhat  removed  from  the 
variational  minimizer  and  thus  the  number  of  PDE 


descent  steps  needed  to  reach  the  minimizer  may  not 
be  reduced  sufficiently  for  speedy  computation. 

What  is  needed  then  is  a  prefiltering  technique 
that  can  quickly  smooth  interior  regions  without  de¬ 
grading  or  moving  edges.  This  brings  us  to  peer 
group  averaging  (PGA).  In  this  method  the  pixel  in¬ 
tensities  are  adjusted  based  on  local  peer  groups  so 
that  edges  are  respected.  Typically  this  method  con¬ 
verges  very  quickly,  usually  within  2  or  3  iterations. 
As  such  it  provides  a  excellent  prefilter  for  variational 
methods  and  acts  as  a  starting  point  for  a  variety  of 
other  applications.  PGA  is  described  in  Section  3. 

Historically  PGA  was  preceded  by  a  closely  re¬ 
lated  method  of  Osher  and  Rudin  called  shock  filter¬ 
ing.  This  method  uses  a  nonlinear  convection  PDE  to 
propagate  information  from  the  interiors  of  regions 
in  a  way  that  smooths  the  region.  At  the  bound¬ 
aries  of  regions  a  standing  shock  forms;  this  leads  to 
contrast  enhancement  at  the  edges  and  preserves  the 
total  variation  of  the  original  image.  This  method 
and  its  connection  to  PGA  are  described  in  Section 
4. 

2  Variational 

Approximation  and  Boundary  De¬ 
scription 

A  general  variational  framework  for  image  seg¬ 
mentation  and  approximation  has  been  developed  by 
Hewer  et  al.  [14]  that  simplifies  and  systematizies 
approaches  that  had  previously  been  considered  sep¬ 
arately,  especially  those  with  Mumford-Shah  objec¬ 
tive  functionals  [22],  [23],  [24]  and  those  considered 
by  Geman  and  others  [10],  [11],  [12]. 

To  set  the  stage,  suppose  that  we  are  given  a 
blurred  image  g  over  a  domain  0: 

g  =  Au  o  +  T)  (1) 

where  A  is  the  blurring  operator,  u3  is  the  unblurred 
image  and  rj  is  the  noise.  One  approach  to  segment¬ 
ing  and  approximating  such  an  image  consists  of  find¬ 
ing  an  approximation  u  and  a  boundary  set  K  that 
minimizes  an  objective  functional  of  the  form 

E(u,K)  =  wi  J  (Au-g)2  +  w 2  J  Vu-Vu 

Cl\K  Q\K 

+  W3  /  da  (2) 

K 

where  the  last  integral  term  corresponds  to  the 
length  of  the  boundary.  The  scalars  w\,w3  and  w3 
are  weighting  factors  that  determine  respectively  how 
closely  Au  approximates  g,  the  smoothness  of  u  and 
the  extent  of  the  boundary.  Without  loss  of  general¬ 
ity  we  may  assume  that  w3  =  1.  Functionals  of  this 
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type  are  often  referred  to  as  a  Mumford-Shah  func¬ 
tionals.  See  [21]  p.24,  [22],  [23]  and  [24]  for  details. 

Unfortunately  numerical  procedures  for  minimiz¬ 
ing  the  Mumford-Shah  functional  encounter  book¬ 
keeping  problems  associated  with  tracking  regions 
and  their  boundaries.  These  problems  can  be  traced 
to  the  binary  nature  of  the  boundary  description  as 
embodied  in  the  boundary  characteristic  function  x, 
which  takes  on  the  value  1  on  the  boundary  K  and 
0  elsewhere.  Binary  descriptions  of  boundaries  may 
be  appropriate  in  some  special  cases  but  for  most 
problems  the  transitions  between  regions  can  occur 
over  several  pixels  rather  than  abruptly.  Moreover 
the  mathematical  view  of  the  boundary  as  the  dif¬ 
ferential  of  a  region  (hence  the  notation  dR  for  the 
boundary  of  a  region  R)  underscores  the  inherent 
sensitivity  of  the  boundary  description  process;  this 
is  entirely  analogous  to  the  sensitivity  of  derivatives 
with  respect  to  noise. 

For  these  reasons,  it  often  is  appropriate  to  spec¬ 
ify  boundaries  with  a  function  B  taking  continuous 
values  between  0  and  1.  Such  a  function  might  be 
viewed  as  a  probability  boundary  description  but  we 
do  not  explore  that  issue.  Instead  our  main  concerns 
are  utility  and  ease  of  numerical  computation. 

To  accommodate  a  continuous  boundary  function 
B,  the  Mumford-Shah  functional  could  be  recast  as 

E(u,  B)  =  Wl  f  ( Au  -  g )2  (1  -  B)2  (3) 

J  n 

+  w2  f  Vtx  •  Vu  (1  -Bf+  [  B2 
Jci  Jn 

where  wi  and  w-2  are  scalar  weights.  Here  we  have 
replaced  the  integrals  over  Q\K  by  integrals  over  f l 
with  integrands  multiplied  by  (1 — B)2 ,  the  idea  being 
that  since  B  ss  1  is  near  K,  the  integration  of  terms 
times  (1  -  B)2  over  K  is  nearly  0.  Similarly  the 
boundary  length  integral  has  been  replaced  by  the 
integral  of  B 2 . 

There  is  a  significant  amount  of  related  work  in 
image  processing  and  vision.  Early  work  in  this 
area  dealt  with  scale  space  decompositions  induced 
by  Gaussian  smoothing  operators  and  the  motion  of 
edges  (as  identified  with  zero-crossings  of  the  Lapla- 
cian)  in  scale  space.  See  [18],  [19],  [33],  [40],  and 

[37]- 

Identifying  spatial  discontinuities  is  helpful  in 
many  applications  such  as  segmentation,  optical  flow, 
stereo,  and  image  reconstruction.  The  concept  of  a 
“line  process”  is  useful  in  studying  these  problems 
as  one  of  regularization.  The  binary  line  process 
was  introduced  by  Geman  and  Geman  [10]  where  the 
authors  considered  simulated  annealing  based  algo¬ 
rithms  for  achieving  the  global  optimization.  Since 
then  several  modifications  of  the  original  scheme  have 


been  suggested.  Blake  and  Zisserman  [4]  formulated 
the  same  problem  as  minimizing  an  objective  func¬ 
tional  which  enforces  smoothness  while  eliminating 
the  binary  line  process.  See  also  Geiger  and  Gersosi 
[9],  Geman  and  Reynolds  [12],  and  Rangarajan  and 
Chellappa  [30].  Some  of  these  recent  works  involve 
analog  or  continuous  line  processes.  The  connections 
between  the  line  process  approach  to  regularization 
and  outlier  processes  in  robust  statistics  is  explored 
by  Black  and  Rangarajan  [3]. 

Common  to  all  these  algorithms  is  an  objective 
functional  that: 

(a)  enforces  closeness  to  the  original  data  by  includ¬ 
ing  terms  such  as  ( u  -  g)2  or  (Au  —  g) 2 

(b)  promotes  local  smoothness  away  from  edges  by 
including  terms  depending  on  ||Vu|| 

(c)  limits  the  extent  of  the  boundary. 

For  example,  Richardson  [31]  and  Richardson  and 
Mitter  [32]  consider  minimizing  functionals  of  the 
form 

Ec(u,v)  =  [  0(u  -  g)2  +  $(u)||Vu||2 

Jfi 

+  a(c*(v) ||Vu||2  +  ^^J  (4) 

where  a,  /?  and  c  are  weighting  factors  and  v  is  a  con¬ 
tinuous  function  describing  the  boundary.  Ambrosio 
and  Tortorelli  [1],  [2]  have  shown  that,  for  4>(u)  =  v2 
and  4>(u)  =  1,  this  functional  T  -  converges”  as 
c  — ^  0  to  the  following  form  of  the  Mumford-Shah 
functional: 

E(u,v)  =  [  (3(u  -  g)2  +  [  ||Vu||2  +  a\K\  (5) 

J  n  Jci\k 

where  \K\  is  the  length  of  the  boundary  K. 

In  a  similar  vein,  Shah  [36]  proposed  minimizing 
a  pair  of  functionals  dependent  on  u  and  v:  Given  u 
find  v  minimizing 

Vu(v)  =  Ja(  1  -  v)2\\Vu\\  +  |||Vu||2  +  ~  (6) 

where  a  and  p  are  weighting  parameters.  Given  v 
find  u  minimizing 

um  =  /  iiv»ii!  +  m 

J 

where  a  is  a  weighting  parameter. 

The  idea  of  the  second  functional  is  that  the 
boundary  function  v  is  approximately  0  inside  re¬ 
gions  where  we  want  u  —  g  to  be  small.  Hence  the 


6-4 


division  of  ( u  —  g)2  by  v2  can  be  interpreted  as  a  lo¬ 
cal  weighting  that  enforces  close  approximation  of  g 
inside  regions.  Applying  a  steepest  decent  minimiza¬ 
tion  procedure  to  these  functionals  yields  a  pair  of 
coupled  diffusion  PDEs  for  u  and  v.  This  is  also  the 
case  for  the  functionals  studied  by  Richardson  and 
Mitter. 

However,  as  noted  by  Proesmans  et  al.  [29], 
Shah’s  approach  leads  to  blurring  of  the  edges;  this 
can  be  partially  offset  by  working  with  a  modified 
objective  functional  but  some  blurring  still  remains. 

This  blurring  effect  appears  to  be  induced  in  part 
by  the  inclusion  of  the  boundary  gradient  term  ||Vu|| 
in  the  objective  functional,  since  this  results  in  a  dif¬ 
fusion  PDE  for  v.  Inclusion  of  the  boundary  gradient 
term  also  has  the  effect  of  “masking”  the  boundary. 
That  is,  for  a  given  approximation  function  u  the  op¬ 
timal  boundary  function  v  is  the  solution  of  a  non¬ 
linear  elliptic  PDE  and  cannot  be  given  explicitly. 

In  contrast,  the  objective  functionals  of  the 
Mumford-Shah  type  (4)  as  well  as  objective  function¬ 
als  of  the  type  considered  by  Geman  and  Reynolds 
[12],  which  extend  the  work  of  Geman  et  al.  [11],  do 
not  include  a  boundary  gradient  term. 

2.1  Reducible  Objective  Functionals 

Consider  the  following  generalized  form  of  the 
Mumford-Shah  functional  (4) 

E(u,B)=  f  r  (1-B)2  +  B2  (8) 

J  n 

where  the  residual  term  r  depends  on  Au  —  g  as  well 
as  Vu.  For  our  purposes  we  have  found  the  following 
form  of  r  to  be  most  useful 

r  =  wi(Au  —  g)2  +  it/2 1|  Vw|]  (9) 


but  more  general  forms  of  r  are  also  considered  be¬ 
low.  Functionals  of  this  type  have  the  big  advantage 
that  the  optimal  boundary  function  B  can  be  found 
explicitly  for  any  nonnegative  residual  function  r:  in¬ 
dependent  of  the  form  of  r  we  show  that,  for  a  given 
function  u,  the  function  B  that  minimizes  E(u,  B)  is 
given  by 


B  = 


r 

1  +  r 


(10) 


We  denote  this  optimal  boundary  function  by  B  — 
B(u).  This  allows  us  to  eliminate  B  from  the  ob¬ 
jective  functional  and  (after  some  simple  algebra)  we 
are  led  to  the  equivalent  problem  of  minimizing  the 
functional  E(u)  =  E(u,B(u ))  given  by 


(11) 


B;  that  is  minimization  of  the  reduced  functional 
is  really  the  same  as  minimizing  the  Li  norm  of  B 
subject  to  B  =  r/(l  +  r). 

The  following  lemma  shows  that  there  is  a  unique 
boundary  function  that  minimizes  E(u,B). 

Lemma  1  Let  r  =  r(u,g,S7u)  be  nonnegative. 
For  fixed  g  and  u,  the  objective  functional  defined 

by 

E(u,B)=  [  r  (1-B)2 +  B2  (12) 

J  Q 

is  minimized  by  setting  B  =  rj (1  +  r).  Moreover,  for 
any  B 

(13) 

with  equality  only  for  B  =  r/(l  +  r). 

Proof:  See  [14]. 

2.2  Numerical  Implementation 

Once  the  form  of  the  variational  functional  has 
been  selected,  the  nontrivial  problem  of  finding  the 
minimizing  approximation  u  has  to  be  addressed. 
Typically  the  desired  approximation  is  an  equilib¬ 
rium  solution  of  a  nonlinear  diffusion  PDE  with  cer¬ 
tain  boundary  conditions.  To  illustrate,  suppose  that 
we  wanted  to  minimize  a  functional  of  the  form 

E(g,u)  =  [  (u  -  g)2  +  Vu  •  Vu  (14) 
Ju 

where  g  is  the  given  image  and  u  is  an  approximation 
of  g. 

The  minimizing  approximation  u  for  this  func¬ 
tional  satisfies  the  ellpitic  equilibrium  PDE 

A  u  =  u  —  g 

du/dn  =  0  on  dQ 

where  A u  is  the  Laplacian  of  u  and  du/dn  denotes 
the  normal  derivative  on  the  boundary  dQ. 

Numerically  we  can  either  solve  for  the  equilib¬ 
rium  solution  directly  or  follow  u  as  a  function  of  t 
from  an  initial  approximation,  such  as  uo  —  g,  by 
integrating  the  diffusion  PDE 

ut  —  g  —  u  +  A  u  (15) 

subject  to  the  Neumann  boundary  condition 
du/dn  =  0  on  dQ.  Starting  from  the  initial  con¬ 
dition  uo  the  image  u  evolves  as  t  ->  oo  toward  the 
equilibrium  solution. 

The  numerical  results  for  this  paper  were  obtained 
using  the  mixed  norm  objective  functional 


It  is  interesting  that  this  reduced  functional  is  equal 
to  the  Li  norm  of  the  optimal  boundary  function 


E=  f  (Wl(Au  -  g)2  +  w2\\Vu\\)  (1  -  B)2  +  B2 . 

Jn 
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Figure  1 :  Goldhill  image:  original  (left)  and  after  vari¬ 
ational  processing  (right). 


We  used  the  1-norm  for  the  smoothness  term  Vu 
since  this  produced  sharper  edges  in  the  approxima¬ 
tion  u  than  the  2-norm.  Note  that  to  avoid  discon¬ 
tinuous  derivatives  at  Vu  =  0  we  use  the  modified 
smoothness  term  (Vu  •  Vu  -I-  (5)1/2  instead  of  ||Vu||. 
With  this  modification,  the  Euler-Lagrange  descent 
method  for  this  objective  functional  yields  the  follow¬ 
ing  PDE.  For  a  given  approximation  u  of  g ,  define 
the  residual 

r  =  wi{u  -  g)2  +  u/2(Vu  •  Vu  +  8)1!2  (16) 


then  the  descent  PDE  for  u  is  given  by 


ut 


2  wi{g-u) 
(1  +  r)2 


+  U>2V  • 


Vu 

(1  +  r)2 


(17) 


(Vu  •  Vu  +  6) 


subject  to  the  Neumann  boundary  condition. 

Euler’s  method  was  used  to  integrate  the  descent 
PDE  and  we  halted  the  integration  when  the  de¬ 
crease  in  the  value  of  the  objective  functional  became 
less  than  a  user  supplied  tolerance.  Typically  we  ob¬ 
tained  good  results  by  stopping  when  the  decrease  in 
the  objective  functional  from  one  Euler  step  to  the 
next  was  less  than  1  percent  of  the  current  value  of 
the  objective  functional.  This  PDE  descent  proce¬ 
dure  may  lead  to  a  local  minimum  for  the  objective 
functional.  Other  methods,  such  as  simulated  an¬ 
nealing  can  be  used  to  find  a  global  minimum  for 


the  objective  functional  with  high  probability,  but 
the  intensive  computational  costs  can  result  in  unac¬ 
ceptably  long  processing  times.  See  [14]  for  details. 

Example  1:  Figure  1  (left)  shows  a  detail  from  a 
standard  image  entitled  “Goldhill”.  Applying  the 
variational  boundary  method  above  using  (16)  and 
(18)  produces  the  results  in  Figure  1  (right).  Note 
that  the  variational  approach  has  suppressed  the 
small  details  in  the  image  such  as  the  sheep  in  the 
background  and  the  texture  of  the  roof  in  the  fore¬ 
ground.  The  amount  of  smoothing  and  detail  sup¬ 
pression  is  controlled  by  the  weight  parameters  w\ 
and  W2  in  (16).  Figure  2  shows  the  associated  bound¬ 
ary  map. 

To  avoid  the  need  to  take  many  (usually  hundreds) 
of  descent  steps  in  minimizing  the  objective  func¬ 
tional  it  is  helpful  to  prefilter  the  image.  The  next 
section  describes  a  prefilter  that  we  have  used  [13] 
succesfully. 

3  Peer  Group  Averaging 

Peer  group  averaging’ is  a  fast  image  processing 
scheme  that  enhances  objects  of  a  given  diameter, 
and  area.  The  basic  idea  consists  of  two  steps:  to 
enhance  objects  with  n  or  more  pixels  1)  identify  a 
peer  group  of  size  n  for  each  pixel  2)  process  the  pixel 
value  based  on  the  characteristics  of  the  peer  group. 
There  are  many  ways  to  select  the  peer  group  for 
a  given  pixel.  For  example,  see  the  earlier  work  by 
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Figure  2:  Goldhill  image  variational  boundary  map. 

Yaroslavsky  [38]  presenting  an  abstract  formulation 
of  the  group  idea.  In  general,  peer  group  members 
should  share  common  values.  For  a  single  image, 
the  peer  group  may  be  nearby  pixels  with  similar 
intensity  values.  For  a  sequence  of  images  used  in 
determining  optical  flow  fields,  the  peer  group  can 
be  nearby  pixels  (in  time  and  space)  with  similar  in¬ 
tensity  values  and  similar  velocity  values.  In  another 
context,  texture  values  may  be  assigned  to  each  pixel 
and  then  the  peer  group  determined  by  nearness  in 
texture  space. 

In  this  paper  we  discuss  peer  groups  based  on  in¬ 
tensity  nearness.  For  a  given  image  g,  select  a  win¬ 
dow  diameter  d  and  a  peer  group  number  n.  The 
selection  of  d  and  n  should  correspond  to  the  size  of 
the  objects  that  are  to  be  enhanced.  The  peer  group 
for  a  pixel  is  selected  from  the  window  centered  at 
the  pixel  and  consists  of  the  n  pixels  whose  inten¬ 
sity  values  are  closest  to  the  center  value.  Let  u  be 
the  average  over  the  peer  group.  If  we  let  Ak  be  the 
averaging  operator  at  step  k  we  can  represent  the 
PGA  iteration  as  Uk+i  =  AkUk  where  uq  =  g.  The 
PGA  iteration  is  nonlinear  because  of  the  peer  group 
selection. 

Convergence  of  the  PGA  iteration  is  considered 
by  Deng  et  al.  [6]  who  show  that  the  PGA  iterates 
converge  to  an  image  that  is  constant  on  the  inte¬ 
rior  regions  of  the  image  (the  ‘irreducible’  subsets 
of  Theorem  2  in  [6]).  A  comparison  of  PGA  with 
median  filtering,  the  shock  filtering  method  of  Osher 
and  Rudin  [26]  and  morphological  filtering  is  given 
in  [6]. 


One  of  the  main  features  of  PGA  is  that  it  is  a 
discrete  method  designed  for  images  rather  than  a 
continuous  method  such  as  PDE  or  variational  meth¬ 
ods  that  are  subsequently  adapted  to  discrete  im¬ 
ages.  This  correspondence  between  the  motivating 
derivation  and  the  final  application  means  that  the 
parameters  of  PGA  are  closely  aligned  to  the  image 
characteristics  that  we  want  to  enhance.  This  is  dealt 
with  below  where  discuss  parameter  selection. 

3.1  Properties  of  PGA  and  Parameter 
Selection 

The  most  immediate  property  of  PGA  processing 
is  the  invariance  of  objects  for  properly  selected  pa¬ 
rameters.  That  is,  if  a  group  of  n  pixels  all  have  the 
same  intensity  value  and  the  maximum  distance  be¬ 
tween  pixels  is  equal  to  r,  then  by  setting  the  peer 
group  number  set  equal  to  n  and  the  window  diam¬ 
eter.  equal  to  d  =  2r,  the  common  intensity  value 
of  this  group  of  pixels  is  preserved  under  PGA  pro¬ 
cessing.  This  property  is  stable  with  respect  to  noise 
in  the  following  sense.  If  the  intensity  values  of  the 
object  are  perturbed  by  noise  that  is  small  enough 
in  magnitude  so  that  the  membership  in  the  peer 
group  of  the  object  is  not  changed,  then  under  PGA 
processing  the  pixel  values  of  the  object  converge  in 
one  iteration  to  their  collective  average.  This  aver¬ 
age  value  is  equal  to  the  true  intensity  value  of  the 
object  plus  the  average  of  the  noise  over  the  object. 
Suppose  that  the  noise  is  independently  and  iden¬ 
tically  distributed  over  the  pixels  with  mean  0  and 
standard  deviation  a.  Then  the  mean  of  intensity 
value  assigned  to  the  object  under  PGA  processing 
is  the  same  as  the  true  intensity  value,  with  standard 
deviation  cr/^/n  where  n  is  the  number  of  pixels  in 
the  object.  From  this  we  see  that  PGA  processing  is 
very  effective  at  damping  out  noise  even  for  objects 
with  only  a  small  number  of  pixels. 

Edge  enhancement  algorithms  have  to  balance 
conflicting  demands.  On  the  one  hand,  edges  of 
important  features  should  be  strengthened  without 
changing  their  location.  At  the  same  time,  we  want 
to  smooth  region  interiors  and  reduce  undesirable 
edges  associated  with  clutter  and  noise. 

The  extent  to  which  these  opposing  goals  are  met 
is  determined  by  the  choice  of  the  algorithm  param¬ 
eters.  For  PGA  there  are  two  parameters:  window 
diameter  d  and  peer  group  number  n.  The  conflict 
between  preserving  edges  and  smoothing  unwanted 
detail  is  reflected  in  the  following  observations: 

1)  If  n  is  larger  than  the  number  of  pixels  in  an  ob¬ 
ject  O  then  O  will  be  merged  with  a  larger  region  of 
size  at  least  n.  In  this  case  edges  associated  with  O 
may  be  lost  or  relocated.  Thus,  to  preserve  edges  in 
an  object  O,  use  n  <  n(0 )  where  n(0)  is  the  number 
of  pixels  in  O. 
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2)  To  preserve  straight  lines  of  pixel  width  w,  use 
n  <  wd.  This  follows  from  identifying  the  line  as  an 
object  O  and  noting  that  O  has  at  most  wd  pixels  in 
a  window  of  diameter  d. 

3)  To  preserve  corners  of  interior  angle  at  least  7t/2, 
identify  O  with  a  square  corner  with  point  at  the 
center  of  the  window.  Use  n  <  d2/4. 

4)  For  Id  signals,  maximal  smoothing  without  edge 
loss  for  an  object  of  size  n(0)  is  obtained  by  setting 
the  window  diameter  equal  to  2n(0)  —  1  and  the  peer 
group  number  equal  to  n(0).  If  the  window  diameter 
is  reduced  below  2 n(0)  —  1  while  n  =  n(0),  then  pix¬ 
els  just  outside  the  object  will  average  over  some  of 
the  object  pixels  and  edge  blurring  will  occur.  Tak¬ 
ing  n  =  n(0)  and  d  =  2n{0)  - 1  allows  each  pixel  to 
select  a  peer  group  entirely  to  the  left  or  entirely  to 
the  right  (including  the  pixel  itself)  thus  preserving 
the  edges. 

5)  As  the  ratio  of  the  peer  group  number  to  the  win¬ 
dow  area  increases,  the  PGA  approximation  becomes 
smoother. 

3.2  Analytic  Results  on  Parameter  Se¬ 
lection 

The  following  lemma  is  useful  in  analyzing  the 
problem  of  selecting  the  peer  group  number. 

Lemma  3  Let  I?  be  a  region  with  n(R)  pixels.  If  5j 
and  S2  are  subsets  of  R  with  both  n(Si)  and  77(^2) 
larger  than  n(R) /2,  then  Si  and  S2  have  a  nonempty 
intersection. 

Proof:  See  [6]. 

This  lemma  is  related  to  the  idea  of  local  connectiv¬ 
ity.  We  say  that  a  region  R  in  an  image  g  is  locally 
connected  for  the  peer  group  parameters  d  and  n  if 
the  peer  group  for  any  pixel  i  in  R  has  nonempty 
intersections  with  the  peer  groups  of  the  immediate 
neighboring  pixels  for  each  iteration  of  the  PGA  al¬ 
gorithm  for  g.  If  a  region  R  is  locally  connected 
then  there  are  no  sharp  edges  in  R  since  for  any  two 
neighboring  pixels  the  common  portions  of  their  peer 
groups  ensure  that  their  final  values  under  PGA  are 
related.  The  next  two  lemmas  discuss  conditions  on 
the  PGA  parameters  that  lead  to  local  connectivity 
for  all  or  part  of  the  image. 

Lemma  4  If  the  peer  group  number  n  is  large  rel¬ 
ative  to  the  window  diameter  d,  then  the  entire  im¬ 
age  is  locally  connected  under  PGA.  This  occurs  if 
n  >  (d+  l)/2  for  Id  signals  and  if  n  >  d(d+  l)/2  for 
2d  images  where  we  assume  that  d  is  odd. 

Proof:  See  [6]. 

From  the  above  it  is  clear  that  for  a  fixed  window 
size,  as  the  peer  group  number  increases  so  does  the 


smoothing.  The  next  lemma  discusses  how  the  peer 
group  number  affects  smoothing  within  regions.  The 
idea  here  is  to  recover  objects  under  PGA  approxi¬ 
mation  in  the  sense  that  all  the  pixel  values  of  the 
object  are  locally  connected  to  each  other.  However 
we  also  want  to  avoid  connections  with  pixels  outside 
the  object.  If  .n  >  n(0)  where  n{0)  is  the  number 
of  pixels  in  the  object  O,  then  the  peer  groups  for 
pixels  in  O  will  be  forced  to  include  pixels  outside 
of  O,  resulting  in  edge  blurring.  On  the  other  hand, 
if  the  peer  group  number  is  too  small,  then  slight 
variations  within  0  can  lead  to  O  breaking  up  into 
several  smaller  unconnected  regions  under  PGA. 

To  analyze  this  problem  we  will  make  the  simpli¬ 
fying  assumption  that  the  object  O  is  well-separated 
from  the  rest  of  the  image  in  the  sense  that  for  any 
peer  group  number  n  <  n(0)  and  any  pixel  i  £  O, 
the  peer  group  P(i )  is  a  subset  of  O. 

Lemma  5  Assume  that  O  is  an  object  in  an  image  g 
that  is  well-separated  from  the  rest  of  g.  If  the  peer 
group  number  n  satisfies  n(0)/ 2  <  n  <  n{0)  then 
O  is  locally  connected  under  PGA. 

Proof:  See  [6]. 

3.3  Automatic  Parameter  Selection 

Althougth  the  preceding  observations  make  it  pos¬ 
sible  to  predict  in  a  general  way  how  the  peer  group 
size  affects  the  smoothing  under  PGA,  it  is  still  the 
case  that  in  most  images  we  want  to  vary  the  peer 
group  size  from  point  to  point  in  order  to  enhance 
some  features  and  smooth  others.  For  example,  if  we 
use  a  3x3  window  then  a  peer  group  of  size  6  pre¬ 
serves  straight  edges  but  not  corners.  If  we  lower  the 
peer  group  number  to  size  4  then  corners  are  also 
preserved  but  we  don’t  achieve  the  smoothing  that 
we  see  with  n  =  6. 

To  get  around  this  problem  Deng  et  al.  [5]  in¬ 
troduced  the  idea  of  using  the  Fisher  discriminant  to 
select  the  peer  group  for  each  pixel.  That  is  for  a  par¬ 
ticular  pixel  let  01,02, .. .  0m  be  the  intensitiy  values 
over  the  window  with  gc  the  intensity  of  the  central 
pixel.  Form  the  intensity  differences  d ,  =  \g%  -  gc\- 
Use  the  Fisher  discriminant  to  separate  these  differ¬ 
ences  into  2  groups.  That  is  maximize  the  objective 
functional 

= 

VI  +v2 

over  the  peer  group  number  k,  where  ai  and  v\  are 
the  average  and  variance  over  the  first  group  and 
a2 ,  v2  are  the  average  and  variance  of  the  second 
group. 

This  procedure  produces  excellent  results  with 
only  a  slight  increase  in  processing  time  to  minimize 
the  Fisher  discriminant. 
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Figure  3:  Goldhill  image:  Fisher-PGA  (left)  and  vari¬ 
ational  processing  (right). 

3.4  PGA  as  a  Preprocessor  for  Varia¬ 
tional  Approximation 

In  general,  one  selects  the  original  image  g  as  ini¬ 
tial  data  for  the  descent  PDE  associated  with  the 
variational  objective  functional.  However,  this  can 
lead  to  the  need  to  take  many  (hundreds)  of  descent 
steps  in  order  to  achieve  minimization.  To  avoid  this 
problem  we  used  PGA  as  a  preprocessing  step  to  gen¬ 
erate  an  initial  image  for  the  descent  PDE;  for  details 
see  [13]. 

Example  2:  We  applied  PGA  to  the  Goldhill  image 
using  automatic  parameter  selection  via  the  Fisher 
discriminant;  see  Figure  3  (left).  Note  that  the 
results  are  almost  indistinguishable  from  the  vari¬ 
ational  processing  in  Figure  3  (right).  Since  the 
PGA  processing  is  much  faster  than  the  variational 
method  we  obtain  considerable  computational  sav¬ 
ings  by  using  PGA  alone  or  as  a  preprocessor  for  the 
variational  method. 

3.5  Multiscale  PGA 

One  problem  with  PGA  is  the  limitation  to  small 
windows  for  computational  speed.  In  particular  it 
would  be  nice  to  be  able  to  obtain  uniform  smooth¬ 
ing  over  large  regions  without  having  to  use  large 
windows  and  peer  groups.  To  achieve  this  a  multi¬ 
scale  PGA  procedure  similar  in  spirit  to  multigrid 
methods  for  solving  large  systems  of  linear  equations 
has  been  developed  [6].  The  basic  idea  is  to  work 
on  several  levels  by  defining  windows  with  skips  be¬ 


tween  pixels.  At  the  first  level  is  the  usual  window 
with  a  distance  of  1  between  pixels;  the  next  level 
has  a  distance  of  2  between  pixels  etc.  Alternating 
the  PGA  iteration  between  levels  results  in  speeding 
the  passage  of  intensity  information  within  regions. 
Fortunately  there  is  a  simple  way  to  implement  this 
procedure.  For  example  to  do  a  PGA  iteration  with 
a  distance  of  2  between  pixels  in  each  window,  one 
simply  has  to  subsample  the  image  skipping  every 
other  pixel  and  then  run  regular  PGA  on  the  sub¬ 
sampled  image.  Subsampling  this  way  transforms  a 
large  image  into  4  smaller  images;  after  running  PGA 
on  each  of  the  smaller  images  they  are  then  recom¬ 
bined  into  a  larger  image.  In  this  way  we  may  do 
one  iteration  of  PGA  on  the  large  image  followed  by 
one  iteration  on  the  smaller  images  and  repeat  until 
the  process  converges.  Convergence  is  usually  quick 
(3  to  5  iterations) .  Further  subsampling  of  the  small 
images  can  be  done  if  desired. 

Example  3:  Figure  4  (left)  shows  a  detail  from  a 
satellite  image  of  an  agricultural  area.  This  image 
is  heavily  contaminated  by  speckle  and  background 
clutter.  Applying  multiscale  PGA  eliminates  the 
speckle  in  just  3  iterations  as  see  in  Figure  4  (right) . 

4  Shock  Filtering 

In  shock  filtering  [34]  [26], [27],  intensity  values 
from  the  interior  of  regions  move  outward  towards 
the  region  edges  along  gradient  lines.  The  convexity 


Figure  4:  Agricultural  image:  original  (left)  and  after 
multiscale  PGA  (right). 

of  the  intensity  along  the  gradient  direction  deter¬ 
mines  the  motion  direction  along  the  gradient  and 
this  direction  assignment  means  that  when  two  re¬ 
gions  meet  at  an  edge  the  image  intensity  will  experi¬ 
ence  a  jump.  Thus  the  edges  in  the  image  correspond 
to  stationary  shock  fronts. 

In  shock  filtering  the  maximum  values  of  the  im¬ 
age  intensity  and  the  minimum  values  move  out¬ 
ward  from  the  interior  of  their  regions  to  meet  at 
the  boundaries.  This  means  that  the  contrast  at  the 
edges  is  maximized.  This  also  means  that  shock  fil¬ 
tering  preserves  the  total  variation  of  the  original 
image. 

Shock  filtering  smooths  in  the  sense  that  each  re¬ 
gion  assumes  a  constant  value.  However,  shock  fil¬ 
tering  does  not  remove  isolated  noise  such  as  salt- 
and-pepper  noise,  as  discussed  by  Osher  and  Rudin 
in  [26]. 

In  its  simplest  form  for  Id  signals,  shock  filtering 
uses  the  original  signal  g  as  initial  data  for  a  nonlin¬ 
ear  convection  equation: 

ut  -  -sgn (uxx)  ux 

with  u(x,  0)  =  g(x).  In  this  formulation  we  must  be 
careful  to  form  derivative  approximations  from  the 
appropriate  direction.  Thus  if  intensity  information 
is  to  move  from  right  to  left,  then  we  want  ux  to 
represent  the  righthand  derivative  and  we  use  a  for¬ 
ward  difference  to  approximate  ux.  Similarly  we  use 
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a  backward  difference  if  we  want  intensity  informa¬ 
tion  to  move  from  left  to  right. 

Consider  a  simple  Euler  update  scheme  for  the 
shock  filter  equation:  let  h  be  the  time  step  and  set 
unew  _  u.  _|_  frUt  jf  u  is  montone  increasing  at  i  and 
uxx  <  0  in  the  sense  that  Ui±i  —  2iq  +  U{- 1  <  0  then 
the  choice  h  =  1/2  leads  to  u"e“'  =  (u*  +  «j+i)/2. 
This  is  the  same  result  we  would  get  with  PGA  for 
a  peer  group  of  size  n  —  2  because  the  convexity 
condition  Ui+i  —  2uj  +  iq-i  <  0  implies  that  |uj+i  — 
Ui|  <  \ui  — Ui-i\.  Similarly,  if  uxx  >  0  the  choice  h  = 
1/2  in  the  shock  filter  Euler  update  leads  to  the  same 
result  as  the  PGA  update:  u"eu;  =  ( Ui-i  +  iq)/2. 

This  intersection  of  shock  filtering  and  PGA  for 
particular  parameter  choices  means  that  results  for 
one  method  apply  immediately  to  the  other.  For  ex¬ 
ample,  PGA  with  n  =  2  for  signals  is  total  variation 
preserving  because  the  same  is  true  for  shock  filter¬ 
ing.  However,  the  two  methods  are  not  the  same 
for  other  choices  of  parameters.  In  particular  PGA 
with  larger  peer  group  sizes  automatically  incorpo¬ 
rates  smoothing  over  the  peer  group  and  is  able  to 
handle  problems  such  as  the  isolated  intensity  spikes 
of  salt  and  pepper  noise. 

Example  4:  This  is  a  Id  signal  example  consist¬ 
ing  of  two  steps  of  different  heights  and  widths  to¬ 
gether  with  Gaussian  noise  (see  Figure  5a).  Figure 
5b  shows  the  exact  signal  with  noise  added.  Af¬ 
ter  using  shock  filtering  we  were  able  to  reconstruct 
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a  b  c 


Figure  5:  Exact  step  signal  (a),  noisy  signal  (b),  shock 
filter  reconstruction  (c) 

the  signal  almost  exactly  (Figure  5c).  We  also  note 
that  a  similar  example  has  been  studied  by  Oman 
[25]  using  a  variety  of  approximation  methods  in¬ 
cluding  Sobolev  H1  reconstruction,  total  variation 
approximation,  low  pass  Fourier  reconstruction,  and 
wavelet  methods  (in  which  denoising  in  the  manner 
of  Donoho  and  Johnstone  [7]  was  used  for  Harr  and 
Daubechies  wavelets) .  The  shock  filtering  results  are 
superior  to  (or  approximately  the  same  in  the  case  of 
the  total  variation  method)  the  results  reported  by 
Oman. 

The  correspondence  between  PGA  and  shock  fil¬ 
tering  is  helpful  in  analyzing  the  stability  of  an  ac¬ 
celerated  version  of  PGA. 

4.1  Shock  Filtering  and  Accelerated 
PGA 

In  analogy  with  the  SOR  method  of  solving  large 
systems  of  linear  equations  we  can  accelerate  the  it¬ 
erative  PGA  method  using 

uk+1  =  uuk  +  (1  -  u)uk 

where  uk  denotes  one  regular  PGA  iteration  applied 
to  uk  and  u  is  a  scalar  between  0  and  2.  The  stability 
of  this  method  is  rather  easily  analyzed  in  the  one 
dimensional  case  because  of  the  correspondence  with 
shock  filtering. 

For  a  convection  equation  of  the  form 


the  Courant  stability  condition  (c.f.  Ferziger  [8] 
p.237)  relates  the  speed  of  transmission  c  to  the  ratio 
of  the  spatial  step  size  dx  and  the  temporal  step  size 
dt  for  Eulers  method: 


This  can  be  interpreted  as  follows:  |c|  is  the  speed 
with  which  information  moves  in  the  solution  be¬ 
cause  the  solution  to  the  convection  equation  has  the 
form  u(x,t)  =  u(x  +  ct).  On  the  other  hand,  the  ra¬ 
tio  dx/dt  is  the  speed  with  which  information  moves 
in  the  Euler  approximation.  The  Courant  condition 
thus  requires  that  for  Euler’s  method  to  be  stable,  in¬ 
formation  in  the  approximation  must  move  at  least 
as  fast  as  it  does  in  the  true  solution.  Applying  this 
to  the  shock  filtering  equation  with  dx  =  1  (i.e.,  one 
nodal  distance)  and  |c|  =  1  we  have  the  stabiltity 
condition 

0  <  dt  <  1.  (20) 

We  can  connect  the  shock  filtering  stability  condi¬ 
tion  with  the  accelerated  PGA  method  as  follows. 
Under  the  same  assumptions  as  for  the  shock  fil¬ 
tering  case,  i.e.,  u  is  montone  increasing  at  i  and 
uxx  <  0  in  the  sense  that  Ui+i  —  2ui  +  Uj_j  <  0  then 
|uj+i  —  Ui\  <  | Ui  -  Uj_i|.  This  means  that  the  peer 
group  for  n  =  2  consists  of  the  pixels  i  and  i  +  1. 
This  gives  the  peer  group  average  as 

Iti-fl  +  Ui 

ai  =  - o - 


Ut  =  cu. 


(18) 


(21) 
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Substituting  this  in  the  accelerated  PGA  scheme 
gives 

uaccel  _  ua.  +  (1  _  cj)uj 

=  o,(^+12+^)+(1-u,K 

w  ,  ft  u\ 

=  2Ui+1  +  V1'  2)Ui- 

In  the  analysis  of  the  shock  filtering  we  assumed 
that  uxx  <  0.  The  case  for  uxx  >  0  can  be  handled 
in  the  same  way  except  that  the  averaging  is  to  the 
left  instead  of  the  right.  In  any  case  we  have  the 
result  that  the  accelerated  PGA  method  with  n  =  2 
is  stable  for  Id  signals  if  0  <  u  <  2  since  this  range 
is  equivalent  to  the  stability  condition  0  <  dt  <  1  for 
the  shock  filtering  problem. 

Accelerated  PGA  for  images  takes  a  particularly 
simple  form  for  oj  =  2  and  n  =  2.  Indeed  if  u  —  2  and 
n  =  2,  then  the  update  scheme  for  pixel  ( i ,  j)  consists 
of  simply  selecting  the  pixel  value  from  the  window 
that  is  closest  in  intensity  to  the  central  pixel’s  in¬ 
tensity: 

<fe!  =  ivy  (22) 

where  (i',j')  ^  (i,j)  minimizes  |ivy  -  iiyj  over  the 
window. 

This  means  that  the  update  does  not  require  aver¬ 
aging  ;  rather  it  takes  the  form  of  a  substitution.  For 
this  reason  it  is  stable  in  the  sense  that  the  acceler¬ 
ated  pixel  value  must  lie  in  the  [wmin ,  wmax]  where 
wmin  and  Wmax  are  the  minimum  and  maximum  in¬ 
tensities  over  the  window.  The  only  computation 
involved  is  in  selecting  which  pixel  value  to  use  in 
the  substitution.  If  we  take  the  ‘window’  to  be  the 
four  closest  pixels  values  \i  —  i'\  +  \j  —  j'\  =  1  then 
this  choice  can  be  made  at  a  cost  of  only  two  flops 
per  pixel:  At  interior  points  in  the  image 

Step  1  Form  the  x  forward  difference  matrix  DXij  = 
\u(i  +  l,j)-u(i,j)\. 

Step  2  Form  the  y  forward  difference  matrix  DYij  - 
\u{i,j  +  1)  -u(i,j)\. 

Using  DX  and  DY  we  can  determine  the  substitu¬ 
tion  values  for  each  pixel  using  three  comparisons  per 
pixel. 

It  has  been  our  experience  for  noisy  IR  images  that 
only  one  accelerated  PGA  step  is  needed  to  remove 
noise  especially  salt-and-pepper  noise.  The  reason 
for  this  lies  in  the  fact  that  for  oj  =  2  and  n  =  2 
,  accelerated  PGA  substitutes  a  nearby  pixel  value 
rather  than  averaging.  This  eliminates  isolated  noise 
spikes  instead  of  reducing  them  by  the  averaging  fac¬ 
tor  1/n  as  would  be  the  case  in  unaccelerated  PGA 
with  oj  =  1. 


5  Conclusion 

PDE  image  processing  is  a  new  science  that  has 
the  potential  to  completely  rework  our  understand¬ 
ing  of  images  and  their  structural  content.  This  pa¬ 
per  has  surveyed  three  methods  in  this  area  and  their 
relations  to  each  other.  Variational  approaches  such 
as  the  Mumford-Shah  method  provide  control  over 
image  approximation,  smoothing  and  boundary  ex¬ 
tent.  These  methods  are  very  successful  but  require 
significant  processing  time;  as  such  they  can  benefit 
from  prefiltering  methods  such  as  peer  group  aver¬ 
aging  and  shock  filtering.  The  latter  two  methods 
have  a  number  of  other  applications.  Surprisingly 
they  are  equivalent  for  Id  signals  even  though  they 
are  derived  from  completely  different  points  of  view 
with  shock  filtering  growing  out  of  front  propaga¬ 
tion  ideas  and  PGA  being  motivated  by  a  desire  to 
work  with  images  on  a  discrete  rather  than  continu¬ 
ous  level.  These  three  approaches  should  be  viewed 
not  as  competitive  but  rather  as  mutually  supportive 
with  the  method  of  choice  determined  by  the  process¬ 
ing  needs  of  the  particular  application. 
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